Monday, June 25, 2007

Accuracy, Lift, 命中率(Precision)和查全率(recall)

http://www.cs.cornell.edu/courses/cs678/2006sp/performance_measures.4up.pdf

http://www2.cs.uregina.ca/~dbd/cs831/notes/lift_chart/lift_chart.html

混淆矩阵
Pred1 Pred0
True1 a b
True0 c d

Accuracy= a+d/a+b+c+d

Lift=(a/(a+b))/((a+c)/(a+b+c+d)); percent of the model, percent natually

Precision=a/a+c (很大一部分查出来的不正确)

Recall=a/a+b (很大一部分正确的没查出来)

 The precision of the classifier, which is defined as the percentage of the actually
insolvent customers in those, predicted as insolvent by the classifier.
 The accuracy of the classifier, which is defined as the percentage of the correctly
predicted insolvent out of the total cases of insolvent customers in the
data set.

Precision and Recall

http://groups.google.com/group/ttnn/browse_thread/thread/389a120f2b7a51f5

如果每个月的欠费客户和非欠费客户比例,不符合二八原则,欠费率大概只有2%,很少。从最后验证看,如果有实际有­1万个欠费的,那么模型预测出来,D、E两级大概两万人,命中(确实欠费)大概8千人,命中率40%,查全率80%,结果还算可以了吧。

---我也做了个类似的模型,命中率50%,查全率60%,好像还不如你这个呢,你这个从效果来看可以了。不过实际情况是客户一般会要求命中率越高越 好,对于查全率似乎不是很关心,可能是因为客户对于差样本审核的成本太高了吧,而且如果命中率很低,客户上报的时候也会很没面子的。

Labels:

0 Comments:

Post a Comment

<< Home