在您创建与R一致的虚拟变量(即数字变量)的条件下,估计系数将是相同的。例如:让我们创建一个假数据,并使用因数拟合Poisson glm。请注意,该gl
函数创建一个因子变量。
> counts <- c(18,17,15,20,10,20,25,13,12)
> outcome <- gl(3,1,9)
> outcome
[1] 1 2 3 1 2 3 1 2 3
Levels: 1 2 3
> class(outcome)
[1] "factor"
> glm.1<- glm(counts ~ outcome, family = poisson())
> summary(glm.1)
Call:
glm(formula = counts ~ outcome, family = poisson())
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9666 -0.6713 -0.1696 0.8471 1.0494
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.0445 0.1260 24.165 <2e-16 ***
outcome2 -0.4543 0.2022 -2.247 0.0246 *
outcome3 -0.2930 0.1927 -1.520 0.1285
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 10.5814 on 8 degrees of freedom
Residual deviance: 5.1291 on 6 degrees of freedom
AIC: 52.761
Number of Fisher Scoring iterations: 4
由于结果具有三个级别,因此我创建了两个虚拟变量(如果dumption = 2则为dummy.1 = 0,如果结果= 3则为dummy.2 = 1),并使用以下数值进行调整:
> dummy.1=rep(0,9)
> dummy.2=rep(0,9)
> dummy.1[outcome==2]=1
> dummy.2[outcome==3]=1
> glm.2<- glm(counts ~ dummy.1+dummy.2, family = poisson())
> summary(glm.2)
Call:
glm(formula = counts ~ dummy.1 + dummy.2, family = poisson())
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9666 -0.6713 -0.1696 0.8471 1.0494
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.0445 0.1260 24.165 <2e-16 ***
dummy.1 -0.4543 0.2022 -2.247 0.0246 *
dummy.2 -0.2930 0.1927 -1.520 0.1285
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 10.5814 on 8 degrees of freedom
Residual deviance: 5.1291 on 6 degrees of freedom
AIC: 52.761
Number of Fisher Scoring iterations: 4
如您所见,估计系数是相同的。但是,如果要获得相同的结果,则在创建虚拟变量时需要小心。例如,如果我创建两个虚拟变量为(如果result = 1,则为dummy.1 = 0;如果结果= 2,则为dummy.2 = 1),则估计结果如下:
> dummy.1=rep(0,9)
> dummy.2=rep(0,9)
> dummy.1[outcome==1]=1
> dummy.2[outcome==2]=1
> glm.3<- glm(counts ~ dummy.1+dummy.2, family = poisson())
> summary(glm.3)
Call:
glm(formula = counts ~ dummy.1 + dummy.2, family = poisson())
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9666 -0.6713 -0.1696 0.8471 1.0494
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.7515 0.1459 18.86 <2e-16 ***
dummy.1 0.2930 0.1927 1.52 0.128
dummy.2 -0.1613 0.2151 -0.75 0.453
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 10.5814 on 8 degrees of freedom
Residual deviance: 5.1291 on 6 degrees of freedom
AIC: 52.761
Number of Fisher Scoring iterations: 4
这是因为,当您outcome
在glm.1中添加变量时,R默认情况下会创建两个虚拟变量即outcome2
,outcome3
并dummy.1
与dummy.2
glm.2中类似地定义它们,即结果的第一级是将所有其他虚拟变量(outcome2
和outcome3
)设置为零。