自举预测间隔

是否有任何引导技术可用于计算点预测的预测间隔，例如通过线性回归或其他回归方法（k近邻，回归树等）获得的点预测？

我以某种方式感到，有时建议的仅引导点预测的方法（例如，参见kNN回归的预测间隔）不是提供预测间隔，而是提供置信区间。

R中的一个例子

# STEP 1: GENERATE DATA

set.seed(34345)

n <- 100 
x <- runif(n)
y <- 1 + 0.2*x + rnorm(n)
data <- data.frame(x, y)


# STEP 2: COMPUTE CLASSIC 95%-PREDICTION INTERVAL
fit <- lm(y ~ x)
plot(fit) # not shown but looks fine with respect to all relevant aspects

# Classic prediction interval based on standard error of forecast
predict(fit, list(x = 0.1), interval = "p")
# -0.6588168 3.093755

# Classic confidence interval based on standard error of estimation
predict(fit, list(x = 0.1), interval = "c")
# 0.893388 1.54155


# STEP 3: NOW BY BOOTSTRAP
B <- 1000
pred <- numeric(B)
for (i in 1:B) {
  boot <- sample(n, n, replace = TRUE)
  fit.b <- lm(y ~ x, data = data[boot,])
  pred[i] <- predict(fit.b, list(x = 0.1))
}
quantile(pred, c(0.025, 0.975))
# 0.8699302 1.5399179

显然，95％的基本自举间隔与95％的置信区间匹配，而不是95％的预测区间。所以我的问题是：如何正确执行？

bootstrap prediction-interval

— 迈克尔·M
source

至少在普通最小二乘法的情况下，您不仅需要点预测，还需要更多。您也想使用估计的残差来构造预测间隔。

— Kodiologist's 2016年

相关：stats.stackexchange.com/q/44860

@duplo：感谢您指出这一点。经典预测间隔的正确长度直接取决于误差项的正态性假设，因此，如果过于乐观，则可以肯定的是，如果从那里衍生出自举版本，则也将是自举版本。我想知道是否有一般的引导方法在回归中起作用（不一定是OLS）。

— Michael M

我认为\ textit {conformal inference}可能就是您想要的，它使您可以构建基于重采样的预测间隔，该间隔具有有效的有限样本覆盖率，并且不会过度覆盖。在arxiv.org/pdf/1604.04173.pdf上有一篇很好的文章，可以作为该主题的介绍阅读，也可以从github.com/ryantibs/conformal获得R包。

— 西蒙·博格·布兰特

下面列出的方法是Davidson和Hinckley（1997）第6.3.3节“ Bootstrap方法及其应用”中描述的方法。由于Glen_b和他的评论在这里。鉴于与此主题有关“交叉验证”存在几个问题，我认为值得写。

线性回归模型为：

\begin{aligned} Y_{i} & = X_{i} β + ϵ_{i} \end{aligned}

$\begin{align} Y_i &= X_i\beta+\epsilon_i \end{align}$

我们有数据， $i=1,2,\ldots,N$ ，我们用它来估计 $\beta$ 为

\begin{aligned} {\hat{β}}_{OLS} & = {(X^{'} X)}^{- 1} X^{'} Y \end{aligned}

$\begin{align} \hat{\beta}_{\text{OLS}} &= \left( X'X \right)^{-1}X'Y \end{align}$

Now, we want to predict what $Y$ will be for a new data point, given that we know $X$ for it. This is the prediction problem. Let's call the new $X$ (which we know) $X_{N+1}$ and the new $Y$ (which we would like to predict), $Y_{N+1}$ . The usual prediction (if we are assuming that the $\epsilon_i$ are iid and uncorrelated with $X$ ) is:

\begin{aligned} Y_{N + 1}^{p} & = X_{N + 1} {\hat{β}}_{OLS} \end{aligned}

$\begin{align} Y^p_{N+1} &= X_{N+1}\hat{\beta}_{\text{OLS}} \end{align}$

The forecast error made by this prediction is:

\begin{aligned} e_{N + 1}^{p} & = Y_{N + 1} - Y_{N + 1}^{p} \end{aligned}

$\begin{align} e^p_{N+1} &= Y_{N+1}-Y^p_{N+1} \end{align}$

We can re-write this equation like:

\begin{aligned} Y_{N + 1} & = Y_{N + 1}^{p} + e_{N + 1}^{p} \end{aligned}

$\begin{align} Y_{N+1} &= Y^p_{N+1} + e^p_{N+1} \end{align}$

Now, $Y^p_{N+1}$ we have already calculated. So, if we want to bound $Y_{N+1}$ in an interval, say, 90% of the time, all we need to do is estimate consistently the $5^{th}$ and $95^{th}$ percentiles/quantiles of $e^p_{N+1}$ , call them $e^5,e^{95}$ , and the prediction interval will be $\left[Y^p_{N+1}+e^5,Y^p_{N+1}+e^{95} \right]$ .

How to estimate the quantiles/percentiles of $e^p_{N+1}$ ? Well, we can write:

\begin{aligned} e_{N + 1}^{p} & = Y_{N + 1} - Y_{N + 1}^{p} \\ = X_{N + 1} β + ϵ_{N + 1} - X_{N + 1} {\hat{β}}_{OLS} \\ = X_{N + 1} (β - {\hat{β}}_{OLS}) + ϵ_{N + 1} \end{aligned}

$\begin{align} e^p_{N+1} &= Y_{N+1}-Y^p_{N+1}\\ &= X_{N+1}\beta + \epsilon_{N+1} - X_{N+1}\hat{\beta}_{\text{OLS}}\\ &= X_{N+1}\left( \beta-\hat{\beta}_{\text{OLS}} \right) + \epsilon_{N+1} \end{align}$

The strategy will be to sample (in a bootstrap kind of way) many times from $e^p_{N+1}$ and then calculate percentiles in the usual way. So, maybe we will sample 10,000 times from $e^p_{N+1}$ , and then estimate the $5^{th}$ and $95^{th}$ percentiles as the $500^{th}$ and $9,500^{th}$ smallest members of the sample.

To draw on $X_{N+1}\left( \beta-\hat{\beta}_{\text{OLS}} \right)$ , we can bootstrap errors (cases would be fine, too, but we are assuming iid errors anyway). So, on each bootstrap replication, you draw $N$ times with replacement from the variance-adjusted residuals (see next para) to get $\epsilon^*_i$ , then make new $Y^*_i=X_i\hat{\beta}_{\text{OLS}}+\epsilon^*_i$ , then run OLS on the new dataset, $\left(Y^*,X \right)$ to get this replication's $\beta^*_r$ . At last, this replication's draw on $X_{N+1}\left( \beta-\hat{\beta}_{\text{OLS}} \right)$ is $X_{N+1}\left( \hat{\beta}_{\text{OLS}}-\beta^*_r \right)$

Given we are assuming iid $\epsilon$ , the natural way to sample from the $\epsilon_{N+1}$ part of the equation is to use the residuals we have from the regression, $\left\{ e^*_1,e^*_2,\ldots,e^*_N \right\}$ . Residuals have different and generally too small variances, so we will want to sample from $\left\{ s_1-\overline{s},s_2-\overline{s},\ldots,s_N-\overline{s} \right\}$ , the variance-corrected residuals, where $s_i=e^*_i/\sqrt{(1-h_i)}$ and $h_i$ is the leverage of observation $i$ .

And, finally, the algorithm for making a 90% prediction interval for $Y_{N+1}$ , given that $X$ is $X_{N+1}$ is:

Make the prediction $Y^p_{N+1}=X_{N+1}\hat{\beta}_{\text{OLS}}$ .
Make the variance-adjusted residuals, $\left\{ s_1-\overline{s},s_2-\overline{s},\ldots,s_N-\overline{s}\right\}$ , where $s_i=e_i/\sqrt(1-h_{i})$ .
For replications :
- Draw $N$ times on the adjusted residuals to make bootstrap residuals $\left\{\epsilon^*_1,\epsilon^*_2,\ldots,\epsilon^*_N \right\}$
- Generate bootstrap $Y^*=X\hat{\beta}_{\text{OLS}}+\epsilon^*$
- Calculate bootstrap OLS estimator for this replication, $\beta^*_r=\left( X'X \right)^{-1}X'Y^*$
- Obtain bootstrap residuals from this replication, $e^*_r=Y^*-X\beta^*_r$
- Calculate bootstrap variance-adjusted residuals from this replication, $s^*-\overline{s^*}$
- Draw one of the bootstrap variance-adjusted residuals from this replication, $\epsilon^*_{N+1,r}$
- Calculate this replication's draw on $e^p_{N+1}$ , $e^{p*}_r=X_{N+1}\left( \hat{\beta}_{\text{OLS}}-\beta^*_r \right)+\epsilon^*_{N+1,r}$
Find $5^{th}$ and $95^{th}$ percentiles of $e^p_{N+1}$ , $e^5,e^{95}$
90% prediction interval for $Y_{N+1}$ is $\left[Y^p_{N+1}+e^5,Y^p_{N+1}+e^{95} \right]$ .

Here is R code:

# This script gives an example of the procedure to construct a prediction interval
# for a linear regression model using a bootstrap method.  The method is the one
# described in Section 6.3.3 of Davidson and Hinckley (1997),
# _Bootstrap Methods and Their Application_.


#rm(list=ls())
set.seed(12344321)
library(MASS)
library(Hmisc)

# Generate bivariate regression data
x <- runif(n=100,min=0,max=100)
y <- 1 + x + (rexp(n=100,rate=0.25)-4)

my.reg <- lm(y~x)
summary(my.reg)

# Predict y for x=78:
y.p <- coef(my.reg)["(Intercept)"] + coef(my.reg)["x"]*78
y.p

# Create adjusted residuals
leverage <- influence(my.reg)$hat
my.s.resid <- residuals(my.reg)/sqrt(1-leverage)
my.s.resid <- my.s.resid - mean(my.s.resid)


reg <- my.reg
s <- my.s.resid

the.replication <- function(reg,s,x_Np1=0){
  # Make bootstrap residuals
  ep.star <- sample(s,size=length(reg$residuals),replace=TRUE)

  # Make bootstrap Y
  y.star <- fitted(reg)+ep.star

  # Do bootstrap regression
  x <- model.frame(reg)[,2]
  bs.reg <- lm(y.star~x)

  # Create bootstrapped adjusted residuals
  bs.lev <- influence(bs.reg)$hat
  bs.s   <- residuals(bs.reg)/sqrt(1-bs.lev)
  bs.s   <- bs.s - mean(bs.s)

  # Calculate draw on prediction error
  xb.xb <- coef(my.reg)["(Intercept)"] - coef(bs.reg)["(Intercept)"] 
  xb.xb <- xb.xb + (coef(my.reg)["x"] - coef(bs.reg)["x"])*x_Np1
  return(unname(xb.xb + sample(bs.s,size=1)))
}

# Do bootstrap with 10,000 replications
ep.draws <- replicate(n=10000,the.replication(reg=my.reg,s=my.s.resid,x_Np1=78))

# Create prediction interval
y.p+quantile(ep.draws,probs=c(0.05,0.95))

# prediction interval using normal assumption
predict(my.reg,newdata=data.frame(x=78),interval="prediction",level=0.90)


# Quick and dirty Monte Carlo to see which prediction interval is better
# That is, what are the 5th and 95th percentiles of Y_{N+1}
# 
# To do it properly, I guess we would want to do the whole procedure above
# 10,000 times and then see what percentage of the time each prediction 
# interval covered Y_{N+1}

y.np1 <- 1 + 78 + (rexp(n=10000,rate=0.25)-4)
quantile(y.np1,probs=c(0.05,0.95))

— Bill
source

Thank you for the useful, detailed explanations. Following these lines, I think that a general technique outside OLS (tree based techniques, nearest neighbour etc.) wont be easily available, right?

— Michael M

There is this one for random forests: stats.stackexchange.com/questions/49750/… which sounds similar.

— Bill

As far as I can tell, if you abstract

X β

$X\beta$ to

f (X, θ)

$f(X, \theta)$ , this technique works for any model.

— shadowtalker

How do you generalise the "variance adjusted residuals" - the OLS approach relies on the leverage - is there a leverage calculation for an arbitrary f(X) estimator?

— David Waterworth