置信区间与t检验的检验统计假设之间的关系


31

众所周知,置信区间和检验统计假设密切相关。我的问题集中在基于数值变量的两组均值比较上。假设使用t检验检验了这种假设。另一方面,可以计算两组均值的置信区间。置信区间的重叠与均值相等的零假设的拒绝之间是否存在任何关系(有利于均值不同的备选方案-双向检验)?例如,如果置信区间不重叠,则测试可能会拒绝原假设。

Answers:


31

是的,在广泛的实际设置中,置信区间比较与假设检验之间存在一些简单的关系。 但是,除了验证CI程序和t检验是否适合我们的数据外,我们还必须检查样本量是否相差不大,以及两组样本的标准偏差是否相似。我们也不应尝试通过比较两个置信区间来得出高度精确的p值,而应该为开发有效的近似值感到高兴。

在尝试调和已经给出的两个回复(@John和@Brett)时,在数学上明确表示是有帮助的。适用于此问题的对称两侧置信区间的公式为

CI=m±tα(n)sn

其中m是的样本均值n独立观测,s是样本标准差,2α是期望的测试尺寸(最大的假阳性率),和tα(n)是上部1α学生t分布的百分与n1自由度。(从常规表示法这种轻微的偏差通过避免在任何需要忙乱简化论述n VS n1。区别,这将是无关紧要的反正)

使用下标12区分两组独立的数据进行比较,其中1对应于两个均值中的较大者,置信区间的重叠表示为不等式(下置信限1)>(上置信限2 );

m1tα(n1)s1n1>m2+tα(n2)s2n2.

通过简单的代数运算,可以使它看起来像相应的假设检验的t统计量(以比较两种均值),从而得出

m1m2s12/n1+s22/n2>s1n2tα(n1)+s2n1tα(n2)n1s22+n2s12.

左侧是假设检验中使用的统计量;它通常是相比于学生t分布的与百分n1+n2自由度:即,tα(n1+n2)。右侧是原始t分布百分位数的有偏加权平均值。

到目前为止的分析证明了@Brett的回答是正确的:似乎没有简单的关系可用。 但是,让我们进一步探讨。我这样做是有启发的,因为从直觉上讲,不重叠的置信区间应该说些什么!

首先,请注意,只有当我们期望s1s2至少近似相等时,这种形式的假设检验才有效。(否则,我们将面临臭名昭著的Behrens-Fisher问题及其复杂性。)在检查si的近似相等性之后,我们可以创建以下形式的近似简化

m1m2s1/n1+1/n2>n2tα(n1)+n1tα(n2)n1+n2.

这里,ss1s2。实际上,我们不应期望这种对置信度极限的非正式比较具有与α相同的大小。那么我们的问题是是否存在一个α,使得右手边(至少近似等于)等于正确的t统计量。即,对于α是什么情况

tα(n1+n2)=n2tα(n1)+n1tα(n2)n1+n2?

事实证明,对于相等的样本大小,通过幂定律将αα连接(相当准确)。 例如,这是n1=n2=2(最低的蓝线),n1=n2=5(中的红线),n1=n2=(两种情况)下两者的对数对数图最高金线)。中间的绿色虚线是下面描述的近似值。这些曲线的直线度掩盖了幂定律。它随n = n 1而变化n=n1=n2,但不多。

Plot 1

答案的确取决于集合{n1,n2},但是很自然地想知道它实际上是否随样本大小的变化而变化多少。特别是,我们可以希望,对中度到大样本(也许n110,n210点左右),样本大小差别不大。在这种情况下,我们可以开发一种定量方法来将αα关联。

只要样本量彼此之间没有太大差异,这种方法就可以工作。本着简洁的精神,我将报告一个综合公式,用于计算与置信区间大小α对应的测试大小α。它是α

αeα1.91;

那是,

αexp(1+1.91log(α)).

在以下常见情况下,此公式相当有效:

  • Both sample sizes are close to each other, n1n2, and α is not too extreme (α>.001 or so).

  • One sample size is within about three times the other and the smallest isn't too small (roughly, greater than 10) and again α is not too extreme.

  • One sample size is within three times the other and α>.02 or so.

The relative error (correct value divided by the approximation) in the first situation is plotted here, with the lower (blue) line showing the case n1=n2=2, the middle (red) line the case n1=n2=5, and the upper (gold) line the case n1=n2=. Interpolating between the latter two, we see that the approximation is excellent for a wide range of practical values of α when sample sizes are moderate (around 5-50) and otherwise is reasonably good.

Plot 2

This is more than good enough for eyeballing a bunch of confidence intervals.

To summarize, the failure of two 2α-size confidence intervals of means to overlap is significant evidence of a difference in means at a level equal to 2eα1.91, provided the two samples have approximately equal standard deviations and are approximately the same size.

I'll end with a tabulation of the approximation for common values of 2α.

2α 2α
0.1 0.02

0.05 0.005

0.01 0.0002

0.005 0.00006

For example, when a pair of two-sided 95% CIs (2α=.05) for samples of approximately equal sizes do not overlap, we should take the means to be significantly different, p<.005. The correct p-value (for equal sample sizes n) actually lies between .0037 (n=2) and .0056 (n=).

This result justifies (and I hope improves upon) the reply by @John. Thus, although the previous replies appear to be in conflict, both are (in their own ways) correct.


7

No, not a simple one at least.

There is, however, an exact correspondence between the t-test of difference between two means and the confidence interval for the difference between the two means.

If the confidence interval for the difference between two means contains zero, a t-test for that difference would fail to reject null at the same level of confidence. Likewise if the confidence interval does not contain 0, the t-test would reject the null.

This is not the same as overlap between confidence intervals for each of the two means.


The reply by @John, which although at present is not quite right in the details, correctly points out that yes, you can relate overlaps of CIs to test p-values. The relationship is not any more complex than the t-test itself. This has the appearance of contradicting your primary conclusion as stated in the first line. How would you resolve this difference?
whuber

I don't think they are contradictory. I can add some caveats. But, in the general sense, without additional assumptions and knowledge about parameters outside of the presentation of the interval (the variance, the sample size) the response stands as is. No, not a simple one at least.
Brett

5

Under typical assumptions of equal variance, yes, there is a relationship. If the bars overlap by less than the length of one bar * sqrt(2) then a t-test would find them to be significantly different at alpha = 0.05. If the ends of the bars just barely touch then a difference would be found at 0.01. If the confidence intervals for the groups are not equal one typically takes the average and applies the same rule.

Alternatively, if the width of a confidence interval around one of the means is w then the least significant difference between two values is w * sqrt(2). This is simple when you think of the denominator in the independent groups t-test, sqrt(2*MSE/n), and the factor for the CI which, sqrt(MSE/n).

(95% CIs assumed)

There's a simple paper on making inferences from confidence intervals around independent means here. It will answer this question and many other related ones you may have.

Cumming, G., & Finch, S. (2005, March). Inference by eye: confidence intervals, and how to read pictures of data. American Psychologist, 60(2), 170-180.


2
I believe you need also to assume the two groups have the same sizes.
whuber

roughly, yes...
John
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.