交互项是否使用中心变量分层回归分析？我们应该集中哪些变量？

我正在运行分层回归分析，但我有一些疑问：

我们是否使用居中变量计算交互作用项？
除了因变量外，我们是否必须将数据集中所有连续变量居中？
当我们必须记录一些变量时（因为它们的sd远远高于平均值），我们是否应该将刚刚记录的变量或初始变量居中？

例如：变量“ Turnover” --->记录的成交量（因为sd相对于平均值而言过高）---> Centered_Turnover？

或直接是营业额-> Centered_Turnover（我们一起工作）

谢谢！！

interaction multicollinearity centering

— 博士生
source

您应将交互作用中涉及的术语居中，以减少共线性，例如

set.seed(10204)
x1 <- rnorm(1000, 10, 1)
x2 <- rnorm(1000, 10, 1)
y <- x1 + rnorm(1000, 5, 5)  + x2*rnorm(1000) + x1*x2*rnorm(1000) 

x1cent <- x1 - mean(x1)
x2cent <- x2 - mean(x2)
x1x2cent <- x1cent*x2cent

m1 <- lm(y ~ x1 + x2 + x1*x2)
m2 <- lm(y ~ x1cent + x2cent + x1cent*x2cent)

summary(m1)
summary(m2)

输出：

> summary(m1)

Call:
lm(formula = y ~ x1 + x2 + x1 * x2)

Residuals:
    Min      1Q  Median      3Q     Max 
-344.62  -66.29   -1.44   66.05  392.22 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  193.333    335.281   0.577    0.564
x1           -15.830     33.719  -0.469    0.639
x2           -14.065     33.567  -0.419    0.675
x1:x2          1.179      3.375   0.349    0.727

Residual standard error: 101.3 on 996 degrees of freedom
Multiple R-squared:  0.002363,  Adjusted R-squared:  -0.0006416 
F-statistic: 0.7865 on 3 and 996 DF,  p-value: 0.5015

> summary(m2)

Call:
lm(formula = y ~ x1cent + x2cent + x1cent * x2cent)

Residuals:
    Min      1Q  Median      3Q     Max 
-344.62  -66.29   -1.44   66.05  392.22 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     12.513      3.203   3.907 9.99e-05 ***
x1cent          -4.106      3.186  -1.289    0.198    
x2cent          -2.291      3.198  -0.716    0.474    
x1cent:x2cent    1.179      3.375   0.349    0.727    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 101.3 on 996 degrees of freedom
Multiple R-squared:  0.002363,  Adjusted R-squared:  -0.0006416 
F-statistic: 0.7865 on 3 and 996 DF,  p-value: 0.5015


library(perturb)
colldiag(m1)
colldiag(m2)

是否将其他变量居中取决于您自己；居中（而不是标准化）不涉及交互的变量将更改拦截的含义，但不会更改其他内容，例如

x1 <- rnorm(1000, 10, 1)
x2 <- x1 - mean(x1)
y <- x1 + rnorm(1000, 5, 5) 
m1 <- lm(y ~ x1)
m2 <- lm(y ~ x2)

summary(m1)
summary(m2)

输出：

> summary(m1)

Call:
lm(formula = y ~ x1)

Residuals:
     Min       1Q   Median       3Q      Max 
-16.5288  -3.3348   0.0946   3.4293  14.0678 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   6.5412     1.6003   4.087 4.71e-05 ***
x1            0.8548     0.1591   5.373 9.63e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.082 on 998 degrees of freedom
Multiple R-squared:  0.02812,   Adjusted R-squared:  0.02714 
F-statistic: 28.87 on 1 and 998 DF,  p-value: 9.629e-08

> summary(m2)

Call:
lm(formula = y ~ x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-16.5288  -3.3348   0.0946   3.4293  14.0678 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  15.0965     0.1607  93.931  < 2e-16 ***
x2            0.8548     0.1591   5.373 9.63e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.082 on 998 degrees of freedom
Multiple R-squared:  0.02812,   Adjusted R-squared:  0.02714 
F-statistic: 28.87 on 1 and 998 DF,  p-value: 9.629e-08

但是，您应该记录变量的日志，因为这样做很有意义，或者因为模型中的残差表明您应该这样做，而不是因为它们具有很大的可变性。回归不对变量的分布进行假设，而对残差的分布进行假设。

— 彼得·弗洛姆
source

谢谢您的回应，彼得！因此，我假设然后我首先必须记录变量（所有预测变量？），然后，我将仅将计算交互作用项所需的自变量居中。另一个问题：您是否建议将变量居中或标准化？再次，非常感谢！

— 博士

是的，在居中之前先登录。标准化和居中处理不同的事情；两者都不对。一些喜欢标准化的人，我通常更喜欢“原始”变量。

— Peter Flom

我看不到如何定义生成模型来y <- x1 + rnorm(1000, 5, 5) + x2*rnorm(1000) + x1*x2*rnorm(1000)帮助说明答案。意思是

x_{1} + 5

$x_1 +5$ 而方差是

1 + 25 + 1 + 1

$1 + 25 + 1 + 1$ ，因此生成模型中没有交互项。

— Rufo