我想知道如何在中指定默认起始值glm
。
这篇文章建议将默认值设置为零。这个人说背后有一个算法,但是相关的链接坏了。
我试图用算法跟踪拟合简单的逻辑回归模型:
set.seed(123)
x <- rnorm(100)
p <- 1/(1 + exp(-x))
y <- rbinom(100, size = 1, prob = p)
# to see parameter estimates in each step
trace(glm.fit, quote(print(coefold)), at = list(c(22, 4, 8, 4, 19, 3)))
首先,不指定初始值:
glm(y ~ x, family = "binomial")
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
NULL
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.386379 1.106234
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3991135 1.1653971
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3995188 1.1669508
第一步,初始值为NULL
。
其次,我将起始值设置为零:
glm(y ~ x, family = "binomial", start = c(0, 0))
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0 0
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3177530 0.9097521
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3909975 1.1397163
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3994147 1.1666173
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3995191 1.1669518
我们可以看到,第一种方法和第二种方法之间的迭代是不同的。
为了查看由glm
我指定的初始值,我尝试只用一次迭代就适合模型:
glm(y ~ x, family = "binomial", control = list(maxit = 1))
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
NULL
Call: glm(formula = y ~ x, family = "binomial", control = list(maxit = 1))
Coefficients:
(Intercept) x
0.3864 1.1062
Degrees of Freedom: 99 Total (i.e. Null); 98 Residual
Null Deviance: 134.6
Residual Deviance: 115 AIC: 119
参数的估计(并不奇怪)对应于第二个迭代中第一种方法的估计,即,[1] 0.386379 1.106234
将这些值设置为初始值会导致与第一种方法中相同的迭代序列:
glm(y ~ x, family = "binomial", start = c(0.386379, 1.106234))
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.386379 1.106234
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3991135 1.1653971
Tracing glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, .... step 22,4,8,4,19,3
[1] 0.3995188 1.1669508
所以问题是,这些值是如何计算的?
glm.fit
代码,但仍然不知道如何计算初始值。
start
值,则将它们用于传递给C_Cdqrls
例程的内容的计算中。如果您不这样做,则将计算传递的值(包括calleval(binomial()$initialize)
),但从glm.fit
不显式计算的值start
。花一两个小时研究glm.fit
代码。