Crypto, data analysis and BI商业智能，数据挖掘和比特币: 统计显著(p value)是什么

Crypto, data analysis and BI商业智能，数据挖掘和比特币

商业智能从业者的个人博客，长期偏居欧洲一隅或某孤岛 A computer engineer's weblog. Created in Dec 19th, 2004. Mainly includes: Research and review of blockchain whitepapers and technologies Computer related knowledge/experience (Rev 2007: data mining, business intelligence, data analysis) Best BBS and resource links, when you need designated information you could go to those discussing forums for help.

Thursday, January 03, 2008

统计显著(p value)是什么

http://www.statsoft.com/textbook/esc.html

p值越低，我们对某个变量间关系的样本观测值出错的可能性就越小

看spss的简单解释：

提出原假设和备择假设，H0和H1
2、确定适当的检验H0的统计量
3、规定显著性水平α
4、计算检验统计量的值
5、根据这个实现值计算p值
6、作出统计决策
（如果p值小于或等于a，就拒绝零假设，这时犯错误的概
率最多为a；如果p值大于a，就不拒绝零假设，因为证
据不足）

The higher the p-value, the less we can believe that the observed relation between variables in the sample is a reliable indicator of the relation between the respective variables in the population. Specifically, the p-value represents the probability of error that is involved in accepting our observed result as valid, that is, as "representative of the population." For example, a p-value of .05 (i.e.,1/20) indicates that there is a 5% probability that the relation between the variables found in our sample is a "fluke." In other words, assuming that in the population there was no relation between those variables whatsoever, and we were repeating experiments like ours one after another, we could expect that approximately in every 20 replications of the experiment there would be one in which the relation between the variables in question would be equal or stronger than in ours

题外话：观察100个随机变量（45对correlation关系），也有机会发现2个“相关关系“的显著度p<0.05

小关系只能在大量样本中才可能被证实显著

Consider the following additional illustration. If a coin is slightly asymmetrical, and when tossed is somewhat more likely to produce heads than tails (e.g., 60% vs. 40%), then ten tosses would not be sufficient to convince anyone that the coin is asymmetrical, even if the outcome obtained (six heads and four tails) was perfectly representative of the bias of the coin. However, is it so that 10 tosses is not enough to prove anything? No, if the effect in question were large enough, then ten tosses could be quite enough. For instance, imagine now that the coin is so asymmetrical that no matter how you toss it, the outcome will be heads. If you tossed such a coin ten times and each toss produced heads, most people would consider it sufficient evidence that something is "wrong" with the coin. In other words, it would be considered convincing evidence that in the theoretical population of an infinite number of tosses of this coin there would be more heads than tails. Thus, if a relation is large, then it can be found to be significant even in a small sample.

3 Comments:

hunter said...: 解读我关于假设检验P值问题的答案

本文来自：6sigma品质网 www.6sq.net 作者：欧立威点击13204次原文：http://bbs.6sq.net/viewthread.php?tid=50923

原题:
投票标题: [其它工具] 再问一个问题，很多人都以为自己明白了呵 (单选) [参与投票的会员]
我们在用MINITAB的时候，只要有假设性检验，就常常会用到p值，但是上述描述中哪些是对于p值的正确的描述？

1.p值代表拒绝原假设出错的实际概率 1 (5.00%)
2.p值大于0.05，应该接受原假设 0 (0.00%)
3.p值小于0.05, 应该拒绝原假设 4 (20.00%)
4.p值是指实验的功效power值 0 (0.00%)
5.p值在其他条件不变时，Delta增大，p值会减小 0 (0.00%)
6.p值在其他条件不变时，样本n增大，p值会减小 0 (0.00%)
7.Alpha是我们设定的显著性水平，p值就是实际的alpha风险 11 (55.00%)
8.p值在单边检验时与Alpha比较，双边检验应与Alpha/2比较 2 (10.00%)
9.p值就是两个分布进行比较时，相互叠加的不可区分的区域 1 (5.00%)
10.p值用于minitab正态性测试时，应该是大于0.10时默认为正态分布 1 (5.00%转载请注明出自六西格玛品质论坛 http://bbs.6sq.net/,本贴地址:http://bbs.6sq.net/viewthread.php?tid=50923

看到Allio对这个问题提出很详细的答案, 又因为他在一个帖子中自称在美国Moto干了八年Six sigma，六年前已是MBB，讲学，著书.看来水平不浅. 现就拿其答案做个解读:

Allio的原答案用黑色, 我的批注用红色.
首先，我们要假定这里的P是P-value, 不是Power （Power＝1－Delta）[{W Power=1-Beta, 而不是Delta]， P－value严格的理解是当原假设成立时，出现目前状况或对原假设更不利状况的概率。[如果P值严格的定义是这样模糊,而不是错误率的数学定义, MINITAB不知该如何计算呢?]
1）对， P-value考虑出现这个问题是多么不寻常，即拒绝原假设的错误是多么小，故是对的。（但严格来说是错的，可改为 P值代表当前样本拒绝原假设出错的概率,但如选错，有点钻牛角尖的感觉) [这个选项是对的. 没有他所说的严格定义, P值计算肯定是根据当前样本的, 还能用其他样本?]
2）错，显然. [选项是对了, 但是我觉得不显然. 因为有两点:1.在实践的运用中出现P接近但大于0.05的时候很多人都很困惑. 倒底要不要拒绝原假设呢? 比如在ANOVA当中,如果一个因素所对应的P值=0.06呢? P= 0.05 - 0.10之间是需要根据实际情况分辨的, 主要看你接受多大的ALPHA风险. 2.即使P值>Alpha, 也不能代表可以接受原假设, 只能是不能拒绝原假设. 这两个结论是不完全一样的.]
3）错，没有考虑双侧检验。[选项选对了, 但原因不对. 双侧检验和单侧检验的区别在于备择假设Ha, 跟本选项无关. 此选项不对的原因是未定义ALPHA=0.05.]
4）错，概念混淆。[选对了, 但Power不等于1－Delta]
5) 未选.错. P值是根据样本分布计算出来的数字, 跟DELTA的设定标准以及其他Alpha或Beta风险的设定都无关
6）对，直观看，n越大，根据N(0,1),t分布或F分布，形状越瘦，相同的检验统计量值，P就越小，实际上，检验统计量值还会变大。（Alpha和Delta风险都会变小）
[选择错误. P值是根据样本分布计算出来的数字,样本的大小只能带来对总体分布的估计不准确, 至于P值,在样本足够代表总体之前,改变样本大小可能会带来P值改变,但是变大变小都有可能,需视具体数据而定. Allio的错误理解有两处: 1. 样本n越大,T分布或F分布的形状那是针对样本平均值的分布而言, 跟抽一组样本去推测总体再进行假设检验的P关系值没有绝对. 2. n越大, Alpha和Delta风险都会变小,这是错误的. 因为Alpha和Delta风险都是人为而定的判断标准, 是不会随具体的样本而变的.]
7）对，同 1) [选择正确. 我们平时所说的ALPHA只是我们可以接受的拒真风险水平,也称显著性水平. 而P值正是根据样本计算出来的拒真风险, 如果这个风险大于我们设定的可接受风险ALPHA, 我们就无法保证拒绝原假设的正确性]
8) 未选错. 无论单边还是双边检验, 实际上我们的样本最后都是用P值和ALPHA比较的. 举例而言, 如果ALPHA=0.05,在MINITAB中单边T检验勿庸置疑会将单边所计算的U1-U2的值所落在分布曲线上的(1-单侧累计概率)即P值和0.05进行比较, 而双边检验会根比较据+/-0.025所计算出来的分布两侧的T0, 然后用样本计算出来的T值分别计算落在T分布曲线上的两个T0以外的概率之和即P值最后与ALPHA=0.05比较,而不是ALPHA/2=0.025
9）错，题目出得就不太好，分类太多，而且重叠部分一定包含Delta risk。 [选择正确.但是跟Delta risk无关.这个选项就是为了迷惑一些一知半解给学员模糊其辞解释的培训而设的, 很大一部分讲师会用这个解释来搪塞学员对P值的疑问]
10）错 H0，是正态分布，H1，不是正态分布，P>0.10, fail to拒绝原假设。但我们没有足够证据证明它是正态分布. [选择错误.在AD正态检验中, 通用的一般标准就是当P>0.10时, 说明拒绝原假设的错误风险很高, 因此原假设不能被拒绝, 而大于0.1的时候是可以视为原假设成立的. 这就是为什么不用0.05而用0.1的原因. 因为当你想要拒绝原假设的时候,一般P值越小越好. 而想要接受原假设的时候, P值越大越好. 这时需要判断的是BETA风险, 因为一般的两个分布的比较中, 当ALPHA设于0.1的时候, 与在0.05的时候,BETA风险的变化十分巨大的. ]

The normality tests evaluate the null hypothesis (H0) that the data follow a normal distribution. If the p-value for the test is less than your chosen a-level, then you must reject H0 and conclude that your data do not follow a normal distribution.
The p-value for the Anderson-Darling normality test (bottom right) of the cooking oil data is 0.970. This value is greater than the chosen a-level of 0.10, thus the dietician will [ not reject H0. There is not enough evidence to suggest that the data do not follow a normal distribution. （copy from Minitab help) [既然Allio在第10个选项中已经承认H0=正态分布成立, 那么上述案例中的Highlight 出来的not reject H0不就是不能拒绝原假设的意思吗? 既然不能拒绝, 又>0.10, 说明只好接受原假设, 即原假设=正态分布是成立的.

这道题当时我提出的原因就是因为, 很多被错误的讲师培训出林林总总错误的理解. 与之相映的,还有我的第一个问题关于管制图. 旨在排除错误的概念, 至于每个人选什么倒不是很重要了. 我基本上能从每个人选择的理由上看出他迷糊在哪个层次.尤其是咨询师或者写书的人.如果自己迷糊了可是会危害一大片的.
希望Allio兄能够理解, 别以为我是在记仇抱复. 因为找出个错的有水平的帖子也是很不容易的{W.转载请注明出自六西格玛品质论坛 http://bbs.6sq.net/,本贴地址:http://bbs.6sq.net/viewthread.php?tid=50923; May 23, 2009 at 10:27 AM
Anonymous said...: 一般来说，P<0.05则推翻0假设，认为2组数据有差异，或2个疗法疗效不同，或1周内7天的危险程度不同。; May 4, 2010 at 1:57 PM
Anonymous said...: t test属于参数检验

单样本进行Kolmogorov-Smirnov Z检验,看是否符合正态

调用Runs过程可进行游程检验，即用于检验序列中事件发生过程的随机性分析

二项分布检验的结果是双侧概率为0.0177，可认为男女比例的差异有高度显著性

K independent sample测试n个样本组是否独立，
调用此过程可对多个独立样本进行中位数检验和Kruskal-Wallis H检验

χ2 值（即H值）为 18.1219,p=0.0001; May 5, 2010 at 8:11 AM

Crypto, data analysis and BI商业智能，数据挖掘和比特币

Thursday, January 03, 2008

统计显著(p value)是什么

3 Comments:

About Me

Previous Posts