对于黄土回归,我作为非统计师的理解是,您可以根据视觉解释来选择跨度(具有众多跨度值的图可以选择看起来最小的平滑度值合适的跨度),也可以使用交叉验证(CV)或广义交叉验证(GCV)。下面是我根据竹泽的出色著作《非参数回归简介》(来自p219)编写的用于黄土回归的GCV的代码。
locv1 <- function(x1, y1, nd, span, ntrial)
{
locvgcv <- function(sp, x1, y1)
{
nd <- length(x1)
assign("data1", data.frame(xx1 = x1, yy1 = y1))
fit.lo <- loess(yy1 ~ xx1, data = data1, span = sp, family = "gaussian", degree = 2, surface = "direct")
res <- residuals(fit.lo)
dhat2 <- function(x1, sp)
{
nd2 <- length(x1)
diag1 <- diag(nd2)
dhat <- rep(0, length = nd2)
for(jj in 1:nd2){
y2 <- diag1[, jj]
assign("data1", data.frame(xx1 = x1, yy1 = y2))
fit.lo <- loess(yy1 ~ xx1, data = data1, span = sp, family = "gaussian", degree = 2, surface = "direct")
ey <- fitted.values(fit.lo)
dhat[jj] <- ey[jj]
}
return(dhat)
}
dhat <- dhat2(x1, sp)
trhat <- sum(dhat)
sse <- sum(res^2)
cv <- sum((res/(1 - dhat))^2)/nd
gcv <- sse/(nd * (1 - (trhat/nd))^2)
return(gcv)
}
gcv <- lapply(as.list(span1), locvgcv, x1 = x1, y1 = y1)
#cvgcv <- unlist(cvgcv)
#cv <- cvgcv[attr(cvgcv, "names") == "cv"]
#gcv <- cvgcv[attr(cvgcv, "names") == "gcv"]
return(gcv)
}
根据我的数据,我执行了以下操作:
nd <- length(Edge2$Distance)
xx <- Edge2$Distance
yy <- lcap
ntrial <- 50
span1 <- seq(from = 0.5, by = 0.01, length = ntrial)
output.lo <- locv1(xx, yy, nd, span1, ntrial)
#cv <- output.lo
gcv <- output.lo
plot(span1, gcv, type = "n", xlab = "span", ylab = "GCV")
points(span1, gcv, pch = 3)
lines(span1, gcv, lwd = 2)
gpcvmin <- seq(along = gcv)[gcv == min(gcv)]
spangcv <- span1[pgcvmin]
gcvmin <- cv[pgcvmin]
points(spangcv, gcvmin, cex = 1, pch = 15)
抱歉,代码相当草率,这是我第一次使用R,但是它应该让您了解如何进行GSV进行黄土回归,以找到比简单的目测更客观的最佳跨度。在上面的图中,您对使函数最小化的跨度感兴趣(在绘制的“曲线”上最低)。