Answers:
r是包含实际结果的向量,pi是包含拟合值的向量。
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
mycost <- function(r, pi){
weight1 = 1 #cost for getting 1 wrong
weight0 = 1 #cost for getting 0 wrong
c1 = (r==1)&(pi<0.5) #logical vector - true if actual 1 but predict 0
c0 = (r==0)&(pi>=0.5) #logical vector - true if actual 0 but predict 1
return(mean(weight1*c1+weight0*c0))
}
并将mycost作为参数放入cv.glm函数中。
@SLi的答案已经很好地说明了您定义的成本函数的作用。但是,我想我要补充一点,成本函数用于计算的delta
值cv.glm
,这是对交叉验证误差的一种度量。但是,关键delta
是成本给出的每一折误差的加权平均值。我们通过检查代码的相关部分来了解这一点:
for (i in seq_len(ms)) {
j.out <- seq_len(n)[(s == i)]
j.in <- seq_len(n)[(s != i)]
Call$data <- data[j.in, , drop = FALSE]
d.glm <- eval.parent(Call)
p.alpha <- n.s[i]/n # create weighting for averaging later
cost.i <- cost(glm.y[j.out], predict(d.glm, data[j.out,
, drop = FALSE], type = "response"))
CV <- CV + p.alpha * cost.i # add previous error to running total
cost.0 <- cost.0 - p.alpha * cost(glm.y, predict(d.glm,
data, type = "response"))
}
该函数返回的值为:
list(call = call, K = K, delta = as.numeric(c(CV, CV + cost.0)),
seed = seed)