我有一个问题,即在回归模型中指定交互的最佳方法是什么。考虑以下数据:
d <- structure(list(r = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("r1","r2"),
class = "factor"), s = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
.Label = c("s1","s2"), class = "factor"), rs = structure(c(1L, 1L,
1L,1L, 1L,2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L),
.Label = c("r1s1","r1s2", "r2s1", "r2s2"), class = "factor"),
y = c(19.3788027518437, 23.832287726332, 26.2533235300492,
15.962906892112, 24.2873740664331, 28.5181676764727, 25.2757801195961,
25.3601044326474, 25.3066440027202, 24.3298865128677, 32.5684219007394,
31.0048406654209, 31.671238316086, 34.1933764518288, 36.8784821769123,
41.6691435168277, 40.4669714825801, 39.2664137501106, 39.4884849591932,
49.247505535468)), .Names = c("r","s", "rs", "y"),
row.names = c(NA, -20L), class = "data.frame")
通过交互方式指定模型的两种等效方法是:
lm0 <- lm(y ~ r*s, data=d)
lm1 <- lm(y ~ r + s + r:s, data=d)
我的问题是,是否可以考虑具有相同交互级别的新变量(rs)来指定交互:
lm2 <- lm(y ~ r + s + rs, data=d)
这种方法有哪些优点/缺点?为什么这两种方法的结果不同?
summary(lm1)
lm(formula = y ~ r + s + r:s, data = d, x = TRUE)
coef.est coef.se
(Intercept) 21.94 1.46
rr2 11.32 2.07
ss2 3.82 2.07
rr2:ss2 4.95 2.92
---
n = 20, k = 4
residual sd = 3.27, R-Squared = 0.87
summary(lm2)
lm(formula = y ~ r + s + rs, data = d, x = TRUE)
coef.est coef.se
(Intercept) 21.94 1.46
rr2 11.32 2.07
ss2 8.76 2.07 # ss2 coef is different from lm1
rsr1s2 -4.95 2.92
---
n = 20, k = 4
residual sd = 3.27, R-Squared = 0.87
attr(terms(lm1),"factors")
和attr(terms(lm2),"factors")
rs
是定义为interaction(r, s)
?