如果X
我们如何在方差公式中直观地解释?推导技术对我来说很清楚。(X T X )− 1
如果X
我们如何在方差公式中直观地解释?推导技术对我来说很清楚。(X T X )− 1
Answers:
考虑一个没有常数项的简单回归,其中单个回归值以样本均值为中心。然后 X ' X
为什么?因为回归变量的变化越大,它包含的信息就越多。当回归变量很多时,这将推广到其方差-协方差矩阵的逆矩阵,该矩阵还考虑了回归变量的协方差。在极端情况下,其中X ' X
观看的一个简单的方法σ 2 (X Ť X ) - 1
从这些公式中的任一个,可以看出,预测变量的较大可变性通常将导致对其系数的更精确估计。这是在实验设计中经常采用的想法,通过选择(非随机)预测变量的值,人们试图使(X T X )的决定因素尽可能大,该决定因素是变化的量度。
高斯随机变量的线性变换有帮助吗?使用规则是,如果,X 〜Ñ(μ ,Σ ),然后甲X + b 〜Ñ(甲μ + b ,甲Ť Σ 甲)。
假设,即ÿ = X β + ε是基础模型和ε 〜Ñ(0 ,σ 2)。
∴ ÿ 〜Ñ(X β ,σ 2)X Ť ÿ 〜Ñ(X Ť X β ,X σ 2 X Ť)(X Ť X )- 1 X Ť ÿ 〜Ñ [ β ,(X Ť X )- 1 σ 2 ]
因此(X T X )− 1 X T只是一个复杂的缩放矩阵,可转换Y的分布。
希望对您有所帮助。
I'll take a different approach towards developing the intuition that underlies the formula Varˆβ=σ2(X′X)−1
To help develop the intuition, we will assume that the simplest Gauss-Markov assumptions are satisfied: xi
Why should doubling the sample size, ceteris paribus, cause the variance of ˆβ
Let's turn, then, to your main question, which is about developing intuition for the claim that the variance of ˆβ
Because by assumption Varx(1)>Varx(2)
It is reasonably straightforward to generalize the intuition obtained from studying the simple regression model to the general multiple linear regression model. The main complication is that instead of comparing scalar variances, it is necessary to compare the "size" of variance-covariance matrices. Having a good working knowledge of determinants, traces and eigenvalues of real symmetric matrices comes in very handy at this point :-)
Say we have n observations (or sample size) and p parameters.
The covariance matrix Var(ˆβ) of the estimated parameters ˆβ1,ˆβ2 etc. is a representation of the accuracy of the estimated parameters.
If in an ideal world the data could be perfectly described by the model, then the noise will be σ2=0. Now, the diagonal entries of Var(ˆβ) correspond to Var(^β1),Var(^β2) etc. The derived formula for the variance agrees with the intuition that if the noise is lower, the estimates will be more accurate.
In addition, as the number of measurements gets larger, the variance of the estimated parameters will decrease. So, overall the absolute value of the entries of XTX will be higher, as the number of columns of XT is n and the number of rows of X is n, and each entry of XTX is a sum of n product pairs. The absolute value of the entries of the inverse (XTX)−1 will be lower.
Hence, even if there is a lot of noise, we can still reach good estimates ^βi of the parameters if we increase the sample size n.
I hope this helps.
Reference: Section 7.3 on Least squares: Cosentino, Carlo, and Declan Bates. Feedback control in systems biology. Crc Press, 2011.
This builds on @Alecos Papadopuolos' answer.
Recall that the result of a least-squares regression doesn't depend on the units of measurement of your variables. Suppose your X-variable is a length measurement, given in inches. Then rescaling X, say by multiplying by 2.54 to change the unit to centimeters, doesn't materially affect things. If you refit the model, the new regression estimate will be the old estimate divided by 2.54.
The X′X matrix is the variance of X, and hence reflects the scale of measurement of X. If you change the scale, you have to reflect this in your estimate of β, and this is done by multiplying by the inverse of X′X.