Answers:
您应将交互作用中涉及的术语居中,以减少共线性,例如
set.seed(10204)
x1 <- rnorm(1000, 10, 1)
x2 <- rnorm(1000, 10, 1)
y <- x1 + rnorm(1000, 5, 5) + x2*rnorm(1000) + x1*x2*rnorm(1000)
x1cent <- x1 - mean(x1)
x2cent <- x2 - mean(x2)
x1x2cent <- x1cent*x2cent
m1 <- lm(y ~ x1 + x2 + x1*x2)
m2 <- lm(y ~ x1cent + x2cent + x1cent*x2cent)
summary(m1)
summary(m2)
输出:
> summary(m1)
Call:
lm(formula = y ~ x1 + x2 + x1 * x2)
Residuals:
Min 1Q Median 3Q Max
-344.62 -66.29 -1.44 66.05 392.22
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 193.333 335.281 0.577 0.564
x1 -15.830 33.719 -0.469 0.639
x2 -14.065 33.567 -0.419 0.675
x1:x2 1.179 3.375 0.349 0.727
Residual standard error: 101.3 on 996 degrees of freedom
Multiple R-squared: 0.002363, Adjusted R-squared: -0.0006416
F-statistic: 0.7865 on 3 and 996 DF, p-value: 0.5015
> summary(m2)
Call:
lm(formula = y ~ x1cent + x2cent + x1cent * x2cent)
Residuals:
Min 1Q Median 3Q Max
-344.62 -66.29 -1.44 66.05 392.22
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.513 3.203 3.907 9.99e-05 ***
x1cent -4.106 3.186 -1.289 0.198
x2cent -2.291 3.198 -0.716 0.474
x1cent:x2cent 1.179 3.375 0.349 0.727
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 101.3 on 996 degrees of freedom
Multiple R-squared: 0.002363, Adjusted R-squared: -0.0006416
F-statistic: 0.7865 on 3 and 996 DF, p-value: 0.5015
library(perturb)
colldiag(m1)
colldiag(m2)
是否将其他变量居中取决于您自己;居中(而不是标准化)不涉及交互的变量将更改拦截的含义,但不会更改其他内容,例如
x1 <- rnorm(1000, 10, 1)
x2 <- x1 - mean(x1)
y <- x1 + rnorm(1000, 5, 5)
m1 <- lm(y ~ x1)
m2 <- lm(y ~ x2)
summary(m1)
summary(m2)
输出:
> summary(m1)
Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-16.5288 -3.3348 0.0946 3.4293 14.0678
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.5412 1.6003 4.087 4.71e-05 ***
x1 0.8548 0.1591 5.373 9.63e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.082 on 998 degrees of freedom
Multiple R-squared: 0.02812, Adjusted R-squared: 0.02714
F-statistic: 28.87 on 1 and 998 DF, p-value: 9.629e-08
> summary(m2)
Call:
lm(formula = y ~ x2)
Residuals:
Min 1Q Median 3Q Max
-16.5288 -3.3348 0.0946 3.4293 14.0678
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.0965 0.1607 93.931 < 2e-16 ***
x2 0.8548 0.1591 5.373 9.63e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.082 on 998 degrees of freedom
Multiple R-squared: 0.02812, Adjusted R-squared: 0.02714
F-statistic: 28.87 on 1 and 998 DF, p-value: 9.629e-08
但是,您应该记录变量的日志,因为这样做很有意义,或者因为模型中的残差表明您应该这样做,而不是因为它们具有很大的可变性。回归不对变量的分布进行假设,而对残差的分布进行假设。
y <- x1 + rnorm(1000, 5, 5) + x2*rnorm(1000) + x1*x2*rnorm(1000)
帮助说明答案。意思是 而方差是 ,因此生成模型中没有交互项。