当我的t统计量如此之大时,为什么我的R平方这么低?


17

我跑了回归有4个变量,都非常统计学显著,带T值7,9,2631(我说因为它似乎无关包括小数),这是非常高的,清晰显著。但是R2只有0.2284。我在误解此处的t值意味着它们不是吗?我在看到的T值的第一反应是,R2将是相当高的,但也许这是一个高R2


1
我敢打赌您的适中,对吧?n
Glen_b-恢复莫妮卡(Monica)2012年

@Glen_b是,大约在6000左右。–
凯尔(Kyle)

10
那么,与小R 2相关的大统计量就完全不明显了。由于标准误降低为1 / tR2t的比值将增加1/nt,而R2会随着n的增加而保持恒定。您为什么在乎R2是什么?您为什么在乎t比值?nR2nR2
Glen_b-恢复莫妮卡(Monica)2012年

Answers:


45

t -值和R2被用来判断非常不同的事情。所述t -值是用来判断你的估计的准确性因素βi的,但R2在响应变量的变化的措施的量通过您的协变量说明。假设您要使用n观察值来估计回归模型,

Yi=β0+β1X1i+...+βkXki+ϵi

其中ϵii.i.dN(0,σ2)i=1,...,n

t -值(绝对值)导致你拒绝零假设,即βi=0。这意味着您可以确信已经正确估计了系数的符号。此外,如果|t|> 4且n>5,则0不在系数的99%置信区间内。所述t用于系数-值βi是估计之间的差βi^和0通过标准误差归一se{βi^}

t=βi^se{βi^}

这只是估计值除以其可变性的度量。如果您有足够大的数据集,那么您将始终具有统计上有意义的(大)t。这并不意味着您的协变量可以解释响应变量中的很多变化。

如@Stat所述,R2衡量因变量引起的响应变量的变化量。有关R2更多信息,请访问Wikipedia。根据你的情况,看来你有一个足够大的数据集,以准确地估计βi的,但是你做协解释和\或预测的响应值方面做得很差。


1
(+1) It is clear from the very beginning that this is a well considered, informative explanation.
whuber

Nice answer. I find the terms "practical significance" and "statistical significance" to often be helpful in thinking about this issue.
Aaron - Reinstate Monica

3
There is also a simple transformation between the two statistics: R2=t2t2+df
Jeff

7

简单地说,就是说,您对变量所引起的平均响应不为零感到非常自信。但是,回归中没有很多其他东西,导致响应跳跃。



0

Several answers given are close but still wrong.

"The t-values are used to judge the accurary of your estimate of the βi's" is the one that concerns me the most.

The T-value is merely an indication of the likelihood of random occurrence. Large means unlikely. Small means very likely. Positive and Negative don't matter to the likelihood interpretation.

"R2 measures the amount of variation in your response variable explained by your covariates" is correct.

(I would have commented but am not allowed by this platform yet.)


2
You seem to write about t-values as if they were p-values.
whuber

-4

The only way to deal with a small R squared, check the following:

  1. Is your sample size large enough? If yes, do step 2. but if no, increase your sample size.
  2. How many covariates did you use for your model estimation? If more than 1 as in your case, deal with the problem of multicolinearity of the covariates or simply, run the regression again and this time without the constant which is known as beta zero.

  3. However, if the problem still persists, then do a stepwise regression and select the model with a high R squared. But which I cannot recommend to you because it brings about bias in the covariates

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.