统计学中有两大分支——描述性统计学(description stats)和推断性统计学(inference stats)。 推断性统计学中,很重要的一点就是区间估计。
置信区间(confidence intervals)是最常用的区间估计。
其估计对象为群体参数(诸如平均数,标准差,比例等),来源为样本采样,产生误差的原因为采样误差(不同的样本得到的目标参数可能不一样 )。
其解释可参考下图:
95%的置信区间含义如下:从同一个群体中采样100次,目标是群体的平均数。100个不同的样本,有100个不同的置信区间,95个置信区间中含有群体目标参数(该例中即为平均是)。
同时,谈到置信区间时,需要注意以下两点: 1. 提高样本容量时,取样误差减小,置信区间变得狭窄。极限情况下,样本等于总体,没有取样误差,置信区间归于样本参数。 2. 置信区间只告诉了群体参数的大致范围,不告诉个体参数的分布情况。
预测区间,指的是通过一定的模型(比如线性模型)得到某个数据的预测值,并估计预测值的区间。
预测遇见一般比置信区间(对于预测的置信区间,可以把参考对象设置为预测的平均数)更宽。因为置信区间只考虑到了样本中的取样误差,而预测区间还得考虑到预测的不确定性。
忍受空间,在置信空间的基础上,增加了包含群体比例这一参数。
上图中,有95%的置信水平,至少95%的灯泡时长会落在(1060,1435)这个区间中。
忍受区间,一般用在对于置信区间有严格要求,通过改变群体比例参数达到要求的情况。
假设检验是根据样本数据,在虚无假说与实验假说中二选一(mutually exclusive)。
一个检验是数据显著(statistically significant),当且仅当其取样数据相对虚无假说(lack of difference)极不寻常,以至于我们针对群体数据可以拒绝虚无假说。
极不寻常有以下三点反映: - 虚无假说成立——图像以虚无假说为中心 - 显著水平——我们画的临界线距离虚无假说多远 - 取样数据——是否落在临界线外面
显著水平有时候也叫做误差率(error rate),其原因是:假设α=0.05\alpha=0.05,假设虚无假说成立,那么我们有0.05的几率,采样的点落在虚无假说之外且足够远以至于拒绝虚无假说,得到了错误的结果。 但这种误差并不意味着实验的错误,而是因为不寻常的随机采样误差,运气罢了。
通常有以下三个指标,检验数据显著性:
以下举一个例子来说明: 首先,确定significance level为0.05,假若数据显著,那么能反映: 1. P value 小于 0.05(假设的前提是null hypothesis满足,P值足够小表示离虚无假说足够远,能够推翻虚无假说) 2. 置信区间中不包含虚无假说
In technical terms, a P value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.
For example, suppose that a vaccine study produced a P value of 0.04. This P value indicates that if the vaccine had no effect, you’d obtain the observed difference or more in 4% of studies due to random sampling error.
P values address only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis. This limitation leads us into the next section to cover a very common misinterpretation of P values.
Incorrect interpretations of P values are very common. The most common mistake is to interpret a P value as the probability of making a mistake by rejecting a true null hypothesis (a Type I error).
There are several reasons why P values can’t be the error rate.
First, P values are calculated based on the assumptions that the null is true for the population and that the difference in the sample is caused entirely by random chance. Consequently, P values can’t tell you the probability that the null is true or false because it is 100% true from the perspective of the calculations.
Second, while a low P value indicates that your data are unlikely assuming a true null, it can’t evaluate which of two competing cases is more likely:
Determining which case is more likely requires subject area knowledge and replicate studies.
Let’s go back to the vaccine study and compare the correct and incorrect way to interpret the P value of 0.04:
To see a graphical representation of how hypothesis tests work, see my post: Understanding Hypothesis Tests: Significance Levels and P Values.
Think that this interpretation difference is simply a matter of semantics, and only important to picky statisticians? Think again. It’s important to you.
If a P value is not the error rate, what the heck is the error rate? (Can you guess which way this is heading now?)
Sellke et al.* have estimated the error rates associated with different P values. While the precise error rate depends on various assumptions (which I discuss here), the table summarizes them for middle-of-the-road assumptions.
P value | Probability of incorrectly rejecting a true null hypothesis |
---|---|
0.05 | At least 23% (and typically close to 50%) |
0.01 | At least 7% (and typically close to 15%) |
Do the higher error rates in this table surprise you? Unfortunately, the common misinterpretation of P values as the error rate creates the illusion of substantially more evidence against the null hypothesis than is justified. As you can see, if you base a decision on a single study with a P value near 0.05, the difference observed in the sample may not exist at the population level. That can be costly!
Now that you know how to interpret P values, read my five guidelines for how to use P values and avoid mistakes.
You can also read my rebuttal to an academic journal that actually banned P values!