为什么RSS分布卡方数np?


28

我想了解为什么在OLS模型下RSS(残差平方和)分布为(是模型中参数的数量,是观测值的数量)。χ2(np)

χ2(np)
ppnn

对于提出这样的基本问题,我深表歉意,但似乎无法在线(或在我的面向应用程序的教科书中)找到答案。


3
请注意,答案展示说法并不完全正确:RSS的分布是σ 2σ2(不ñ - pnp)乘以χ 2ñ - p χ2(np)分布,其中σ 2σ2是错误的真正变化。
ub

Answers:


36

我考虑以下线性模型:Ý = X β + εy=Xβ+ϵ

残差的向量由下式估算

ε =ÿ-X β =-XX'X-1X'ý=QÝ=QXβ+ε=Qε

ϵ^=yXβ^=(IX(XX)1X)y=Qy=Q(Xβ+ϵ)=Qϵ

其中Q = - X X ' X - 1 X 'Q=IX(XX)1X

观察到trQ = n - p(在循环置换下轨迹是不变的),并且Q ' = Q = Q 2。因此,Q的特征值是01(下面有一些详细信息)。因此,存在一个unit矩阵V使得(当且仅当它们是正态时矩阵才可以由unit矩阵对角化。tr(Q)=npQ=Q=Q2Q01V

V ' Q V = Δ = DIAG1 ... 1个 ñ - p  倍0 ... 0 p  倍

VQV=Δ=diag(1,,1np times,0,,0p times)

现在,让我们ķ = V ' εK=Vϵ^

由于εÑ 0 σ 2 Q ,我们有ķ Ñ 0 σ 2 Δ ,并且因此ķ ñ - p + 1 = ... = ķ Ñ = 0。从而ϵ^N(0,σ2Q)KN(0,σ2Δ)Knp+1==Kn=0

ķ 2σ 2 =ķ2σ 2χ 2 Ñ - p

K2σ2=K2σ2χ2np

ķ = ķ 1... ķ Ñ - p'K=(K1,,Knp)

此外,由于V是a矩阵,所以我们也有V

ε2 = ķ 2 = ķ 2

ϵ^2=K2=K2

从而

的RSSσ 2χ 2 Ñ - p

RSSσ2χ2np

最后,观察到该结果意味着

E RSSñ - p=σ2

E(RSSnp)=σ2

Since Q2Q=0Q2Q=0, the minimal polynomial of QQ divides the polynomial z2zz2z. So, the eigenvalues of QQ are among 00 and 11. Since tr(Q)=nptr(Q)=np is also the sum of the eigenvalues multiplied by their multiplicity, we necessarily have that 11 is an eigenvalue with multiplicity npnp and zero is an eigenvalue with multiplicity pp.


1
(+1) Good answer. One can restrict attention to orthogonal, instead of unitary, VV since QQ is real and symmetric. Also, what is SCRSCR? I do not see it defined. By slightly rejiggering the argument, one can also avoid the use of a degenerate normal, in case that causes some consternation to those not familiar with it.
cardinal

2
@Cardinal. Good point. SCR ('Somme des Carrés Résiduels' in french) should have been RSS.
ocram

Thank you for the detailed answer Ocram! Some steps will require me to look more, but I have an outline to think about now - thanks!
Tal Galili

@Glen_b: Oh, I made an edit a couple of days ago to change SCR to SRR. I didn't remember that SCR is mentionned in my comment. Sorry for the confusion.
ocram

@Glen_b: It was supposed to mean RSS :-S Edited again. Thx
ocram

9

IMHO, the matricial notation Y=Xβ+ϵY=Xβ+ϵ complicates things. Pure vector space language is cleaner. The model can be written Y=μ+σGY=μ+σG where GG has the standard normal distributon on RnRn and μμ is assumed to belong to a vector subspace WRnWRn.

Now the language of elementary geometry comes into play. The least-squares estimator ˆμμ^ of μμ is nothing but PWYPWY: the orthogonal projection of the observable YY on the space WW to which μμ is assumed to belong. The vector of residuals is PWYPWY: projection on the orthogonal complement WW of WW in RnRn. The dimension of WW is dim(W)=ndim(W)dim(W)=ndim(W).

Finally, PWY=PW(μ+σG)=0+σPWG,

PWY=PW(μ+σG)=0+σPWG,
and PWGPWG has the standard normal distribution on W, hence its squared norm has the χ2 distribution with dim(W) degrees of freedom.

This demonstration uses only one theorem, actually a definition-theorem:

Definition and theorem. A random vector in Rn has the standard normal distribution on a vector space URn if it takes its values in U and its coordinates in one ( in all) orthonormal basis of U are independent one-dimensional standard normal distributions

(from this definition-theorem, Cochran's theorem is so obvious that it is not worth to state it)

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.