方差不等的t检验中非整数自由度的解释

SPSS t检验程序在比较2个独立均值时报告2次分析，其中1次假设均等方差，1次假设均等方差。假设方差相等时的自由度（df）始终是整数值（等于n-2）。如果未假定等方差，则df为非整数（例如11.467），并且不接近n-2。我正在寻求对用于计算这些非整数df的逻辑和方法的解释。

— 乔尔W.
source

佛罗里达PowerPoint演示文稿的大学包含了一个好交代的如何近似的学生t统计量的抽样分布推导不等方差的情况。

— ub

韦尔奇的t检验是否总是更准确？使用Welch方法有不利之处吗？

— 乔尔W.

如果Welch和原始t检验得出的p值显着不同，那么我应该选择哪一个？如果方差差异的p值仅为0.06，而两次t检验的p值差异为0.00和0.121怎么办？（发生这种情况时，一组2人无差异，另一组25人无差异70,000。）

— Joel W.

不要在

值的基础上进行选择。除非您有充分的理由（甚至在看到数据之前）就假设均等方差，否则就不要进行此假设。

p

$p$

— Glen_b-恢复莫妮卡2014年

这些问题都与何时使用Welch考试有关。这个问题已经被张贴在stats.stackexchange.com/questions/116610/...

— 若埃尔·

Answers:

可以将Welch-Satterthwaite df示为两个自由度的比例加权加权均值，其权重与相应的标准偏差成比例。

原始表达式为：

ν_{_{W}} = \frac{{(\frac{s_{1}^{2}}{n_{1}} + \frac{s_{2}^{2}}{n_{2}})}^{2}}{\frac{s_{1}^{4}}{n_{1}^{2} ν_{1}} + \frac{s_{2}^{4}}{n_{2}^{2} ν_{2}}}

$\nu_{_W} = \frac{\left(\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2\nu_1}+\frac{s_2^4}{n_2^2\nu_2}}$

请注意，是第样本均值的估计方差或该均值的第个标准误的平方。令（样本均值的估计方差之比），因此 $r_i=s_i^2/n_i$ $i^\text{th}$ $i$ $r=r_1/r_2$

\begin{aligned} ν_{_{W}} & = \frac{{(r_{1} + r_{2})}^{2}}{\frac{r_{1}^{2}}{ν_{1}} + \frac{r_{2}^{2}}{ν_{2}}} \\ = \frac{{(r_{1} + r_{2})}^{2}}{r_{1}^{2} + r_{2}^{2}} \frac{r_{1}^{2} + r_{2}^{2}}{\frac{r_{1}^{2}}{ν_{1}} + \frac{r_{2}^{2}}{ν_{2}}} \\ = \frac{{(r + 1)}^{2}}{r^{2} + 1} \frac{r_{1}^{2} + r_{2}^{2}}{\frac{r_{1}^{2}}{ν_{1}} + \frac{r_{2}^{2}}{ν_{2}}} \end{aligned}

$\begin{align} \nu_{_W} &= \frac{\left(r_1+r_2\right)^2}{\frac{r_1^2}{\nu_1}+\frac{r_2^2}{\nu_2}} \newline \newline &=\frac{\left(r_1+r_2\right)^2}{r_1^2+r_2^2}\frac{r_1^2+r_2^2}{\frac{r_1^2}{\nu_1}+\frac{r_2^2}{\nu_2}} \newline \newline &=\frac{\left(r+1\right)^2}{r^2+1}\frac{r_1^2+r_2^2}{\frac{r_1^2}{\nu_1}+\frac{r_2^2}{\nu_2}} \end{align}$

$1+\text{sech}(\log(r))$ $1$ $r=0$ $2$ $r=1$ and then decreases to $1$ at $r=\infty$ ; it's symmetric in $\log r$ .

The second factor is a weighted harmonic mean:

H (\underline{x}) = \frac{\sum_{i = 1}^{n} w_{i}}{\sum_{i = 1}^{n} \frac{w_{i}}{x_{i}}} .

$H(\underline{x})=\frac{\sum_{i=1}^n w_i }{ \sum_{i=1}^n \frac{w_i}{x_i}}\,.$

of the d.f., where $w_i=r_i^2$ are the relative weights to the two d.f.

Which is to say, when $r_1/r_2$ is very large, it converges to $\nu_1$ . When $r_1/r_2$ is very close to $0$ it converges to $\nu_2$ . When $r_1=r_2$ you get twice the harmonic mean of the d.f., and when $s_1^2=s_2^2$ you get the usual equal-variance t-test d.f., which is also the maximum possible value for $\nu_{_W}$ .

With an equal-variance t-test, if the assumptions hold, the square of the denominator is a constant times a chi-square random variate.

The square of the denominator of the Welch t-test isn't (a constant times) a chi-square; however, it's often not too bad an approximation. A relevant discussion can be found here.

A more textbook-style derivation can be found here.

— Glen_b -Reinstate Monica
source

Great insight about the harmonic mean, which is more appropriate than arithmetic mean for averaging ratios.

— Felipe G. Nievinski

What you are referring to is the Welch-Satterthwaite correction to the degrees of freedom. The $t$ -test when the WS correction is applied is often called Welch's $t$ -test. (Incidentally, this has nothing to do with SPSS, all statistical software will be able to conduct Welch's $t$ -test, they just don't usually report both side by side by default, so you wouldn't necessarily be prompted to think about the issue.) The equation for the correction is very ugly, but can be seen on the Wikipedia page; unless you are very math savvy or a glutton for punishment, I don't recommend trying to work through it to understand the idea. From a loose conceptual standpoint however, the idea is relatively straightforward: the regular $t$ -test assumes the variances are equal in the two groups. If they're not, then the test should not benefit from that assumption. Since the power of the $t$ -test can be seen as a function of the residual degrees of freedom, one way to adjust for this is to 'shrink' the df somewhat. The appropriate df must be somewhere between the full df and the df of the smaller group. (As @Glen_b notes below, it depends on the relative sizes of $s^2_1/n_1$ vs $s_2^2/n_2$ ; if the larger n is associated with a sufficiently smaller variance, the combined df can be lower than the larger of the two df.) The WS correction finds the right proportion of way from the former to the latter to adjust the df. Then the test statistic is assessed against a $t$ -distribution with that df.

— gung - Reinstate Monica
source

For one t-test, SPSS reports the df as 26.608 but the n's for the two groups are 22 and 104. Are you sure about " The appropriate df must be somewhere between the full df and the df of the larger group"? (The standard deviations are 10.5 and 8.1 for the smaller and larger groups, respectively.)

— Joel W.

It depends on the relative sizes of

s_{1}^{2} / n_{1}

$s_1^2/n_1$ vs

s_{2}^{2} / n_{2}

$s_2^2/n_2$ . If the larger

n

$n$ is associated with a sufficiently larger variance, the combined d.f. can be lower than the larger of the two d.f. Note that the Welch t-test is only approximate, since the squared denominator is not actually a (scaled) chi-square random variate. However in practice it does quite well.

— Glen_b -Reinstate Monica

I think I'll expand on the relationship between the relative sizes of the

(s_{i}^{2} / n_{i})

$(s_i^2/n_i)$ and the Welch d.f. in an answer (since it won't fit in a comment).

— Glen_b -Reinstate Monica

@Glen_b, I'm sure that will be of great value here.

— gung - Reinstate Monica