如何从Bootstrap回归中获得系数的p值？

10

从罗伯特·卡巴科夫（Robert Kabacoff）的Quick-R中，我有

# Bootstrap 95% CI for regression coefficients 
library(boot)
# function to obtain regression weights 
bs <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to select sample 
  fit <- lm(formula, data=d)
  return(coef(fit)) 
} 
# bootstrapping with 1000 replications 
results <- boot(data=mtcars, statistic=bs, 
     R=1000, formula=mpg~wt+disp)

# view results
results
plot(results, index=1) # intercept 
plot(results, index=2) # wt 
plot(results, index=3) # disp 

# get 95% confidence intervals 
boot.ci(results, type="bca", index=1) # intercept 
boot.ci(results, type="bca", index=2) # wt 
boot.ci(results, type="bca", index=3) # disp

如何获得自举回归系数的p值？ $H_0:\, b_j=0$

r regression p-value bootstrap

— ECII
source

“ p值”是什么意思？什么虚假假设的具体检验？

— Brian Diggs 2014年

更正H0：bj = 0

— ECII，2014年

3

根据置信区间是否包含0，您已经得到 /。由于引导程序中参数的分布不是参数性的，因此无法获得更多详细信息（因此无法获得概率）值为0）。

p < 0.05

$p<0.05$

p > 0.05

$p>0.05$

— Brian Diggs 2014年

如果您不能假设分布，那么如果CI不包含0，您怎么知道p <0.05？这对于z或t分布成立。

— ECII 2014年

我明白了，但是您只能说p <0.05，您不能附加特定值吧？

— ECII 2014年

8

只是另一种有点简单化的变体，但我认为传递消息时未明确使用boot可能会使某些人对其使用的语法感到困惑的库。

我们有一个线性模型： $y = X \beta + \epsilon$ ， $\quad \epsilon \sim N(0,\sigma^2)$

以下是该线性模型的参数引导程序，这意味着我们不对原始数据进行重新采样，但实际上是从拟合模型中生成新数据。另外，我们假设回归系数的自举分布 $\beta$ 是对称的，即平移不变。（粗略地说，我们可以在不影响其属性的情况下移动它的轴）。其背后的想法是， $\beta$ 是由于 $\epsilon$ 因此，如果有足够的样本，它们应该提供近似的真实分布 $\beta$ 的。和以前一样，我们再次测试 $H_0 : 0 = \beta_j$ 并且我们将p值定义为“概率，给定数据概率分布的零假设，结果将与观察到的结果一样极端，甚至更极端”（在这种情况下，观察到的结果是 $\beta$ 是我们为原始模型准备的）。因此，这里去：

# Sample Size
N           <- 2^12;
# Linear Model to Boostrap          
Model2Boot  <- lm( mpg ~ wt + disp, mtcars)
# Values of the model coefficients
Betas       <- coefficients(Model2Boot)
# Number of coefficents to test against
M           <- length(Betas)
# Matrix of M columns to hold Bootstraping results
BtStrpRes   <- matrix( rep(0,M*N), ncol=M)

for (i in 1:N) {
# Simulate data N times from the model we assume be true
# and save the resulting coefficient in the i-th row of BtStrpRes
BtStrpRes[i,] <-coefficients(lm(unlist(simulate(Model2Boot)) ~wt + disp, mtcars))
}

#Get the p-values for coefficient
P_val1 <-mean( abs(BtStrpRes[,1] - mean(BtStrpRes[,1]) )> abs( Betas[1]))
P_val2 <-mean( abs(BtStrpRes[,2] - mean(BtStrpRes[,2]) )> abs( Betas[2]))
P_val3 <-mean( abs(BtStrpRes[,3] - mean(BtStrpRes[,3]) )> abs( Betas[3]))

#and some parametric bootstrap confidence intervals (2.5%, 97.5%) 
ConfInt1 <- quantile(BtStrpRes[,1], c(.025, 0.975))
ConfInt2 <- quantile(BtStrpRes[,2], c(.025, 0.975))
ConfInt3 <- quantile(BtStrpRes[,3], c(.025, 0.975))

如前所述，整个想法是您拥有 $\beta$ 近似于它们的真实值。（显然，此代码针对速度进行了优化，但为了提高可读性。:)）

— usεr11852
source

16

如果我错了，社区和@BrianDiggs可能会纠正我，但我相信您可以为您的问题获得如下的p值。双面测试的p值定义为

2 * 分 [P （ X \leq X | H_{0} ） ， P （ X \geq X | H_{0} ）]

$2*\text{min}[P(X \le x|H_0),P(X \ge x|H_0)]$

因此，如果您按大小对自举系数进行排序，然后将比例确定为零（更大和更小），则最小比例乘以2应该得到p值。

在这种情况下，我通常使用以下功能：

twosidep<-function(data){
  p1<-sum(data>0)/length(data)
  p2<-sum(data<0)/length(data)
  p<-min(p1,p2)*2
  return(p)
}

— 汤姆卡
source

4

引导程序可用于计算 $p$ -values，但需要对您的代码进行重大更改。由于我不熟悉RI，因此只能为您提供参考，在其中您可以查找所需的内容：Davison和Hinkley，1997年的第4章。

Davison，AC和Hinkley，DV1997。自举方法及其应用。剑桥：剑桥大学出版社。

— 马丁·布伊斯（Maarten Buis）
source