使用带有R的引导程序计算p值

我使用“ boot”包来计算近似的两面自举p值，但是结果与使用t.test的p值相差太远。我无法弄清楚我在R代码中做错了什么。有人可以给我一个提示吗

time = c(14,18,11,13,18,17,21,9,16,17,14,15,
         12,12,14,13,6,18,14,16,10,7,15,10)
group=c(rep(1:2, each=12))
sleep = data.frame(time, group)

require(boot)
diff = function(d1,i){
    d = d1[i,]
    Mean= tapply(X=d$time, INDEX=d$group, mean)
    Diff = Mean[1]-Mean[2]
    Diff
}

set.seed(1234)
b3 = boot(data = sleep, statistic = diff, R = 5000, strata=sleep$group)

pvalue = mean(abs(b3$t) > abs(b3$t0))
pvalue

两面自举的p值（pvalue）= 0.4804，但t.test的两面p值为0.04342。两个p值的差约为11倍。怎么会这样

— 涂2
source

b3 $ t0为何有两个条目？

— 西安

这是一个姓！

— 猫王

您正在计算

p

p

$p$ 值不正确。该文件说

t 0

$t0$ 是观察到的统计数据，而不是符号表示的空分布。您需要在null下得出采样dist-n的估计值。查看我的答案以获取更多信息。尝试mean(abs(b3$t0) < abs(b3$t-mean(b3$t)))进行偏差未校正的测试。

— AdamO'5

Answers:

您正在使用引导程序根据观察到的数据的经验分布生成数据。这对于确定两个均值之差的置信区间可能很有用：

> quantile(b3$t,c(0.025,0.975))
     2.5%     97.5% 
0.4166667 5.5833333

得到一个 $p$ -value，您需要在原假设下生成排列。可以这样做，例如：

diff2 = function(d1,i){
    d = d1; 
    d$group <- d$group[i];  # randomly re-assign groups
    Mean= tapply(X=d$time, INDEX=d$group, mean)
    Diff = Mean[1]-Mean[2]
    Diff
}

> set.seed(1234)
> b4 = boot(data = sleep, statistic = diff2, R = 5000)
> mean(abs(b4$t) > abs(b4$t0))
[1] 0.046

在此解决方案中，组的大小不是固定的，您可以通过从初始组集中进行引导来将组随机分配给每个人。在我看来，这是合法的，但是更经典的解决方案是固定每个组的人数，因此您只需要对组进行置换而不是引导（这通常是由实验设计决定的，组的大小事先已确定））：

> R <- 10000; d <- sleep
> b5 <- numeric(R); for(i in 1:R) { 
+    d$group <- sample(d$group, length(d$group)); 
+    b5[i] <- mean(d$time[d$group==1])-mean(d$time[d$group==2]); 
+ }
> mean(abs(b5) > 3)
[1] 0.0372

— Elvis
source

This is technically the permutation test, not a bootstrap p-value.

— AdamO

@AdamO I agree that what is presented in this answer is permutation test (and its slightly modified variant); this is because during the resampling the groups are pooled. In contrast, in a bootstrap-based test, values for each group should be sampled using only the data for that same group. Here is one answer explaining how to do it: stats.stackexchange.com/a/187630/28666.

— amoeba says Reinstate Monica

@amoeba I think the answer you link is also a permutation test, related to the bootstrap only insofar as they involve resampling. It's perfectly fine to report, but for reporting it is two methods that are being used. A nonparametric bootstrap is technically unable to generate data under a null hypothesis. See my answer for some of how p-values are generated from a bootstrap distribution.

— AdamO

@AdamO I guess it's a question of terminology, but I don't see how the procedure described in the linked answer can be called a "permutation" test because nothing is permuted there: resampled values for each group are generated using the data from that group only.

— amoeba says Reinstate Monica

Elvis, I think the first piece of code in your answer is permutation test too. When you resample, you pool the groups together! This is what defines permutation test.

— amoeba says Reinstate Monica

The answer of Elvis relies on permutations but in my opinion it does not make clear what is wrong with the original bootstrap approach. Let me discuss a solution based solely on bootstrap.

The crucial problem of your original simulation is that bootstrap always provides you with the TRUE distribution of the test statistic. However, when computing the p-value you have to compare the obtained value of the test statistic to its distribution UNDER H0, i.e. not with the true distribution!

[Let's make it clear. For example, it is known that the test statistic T of the classical t-test has the classical "central" t-distribution under H0 and a noncentral distribution in general. However, everyone is familiar with the fact that the observed value of T is compared to the classical "central" t-distribution, i.e. one does not try to obtain the true [noncenral] t-distribution to make the comparison with T.]

Your p-value 0.4804 is so large, because the observed value "t0" of the test statistic Mean[1]-Mean[2] lies very close to the centre of the bootstrapped sample "t". It is natural and typically it is always so [i.e. irrespective of the validity of H0], because the bootstrapped sample "t" emulates the the ACTUAL distribution of Mean[1]-Mean[2]. But, as noted above [and also by Elvis], what you really need is the distribution of Mean[1]-Mean[2] UNDER H0. It is obvious that

1) under H0 the distribution of Mean[1]-Mean[2] will be centered around 0,

2) its shape does not depend on the validity of H0.

These two points imply that the distribution of Mean[1]-Mean[2] under H0 can be emulated by the bootstrapped sample "t" SHIFTED so that it is centered around 0. In R:

b3.under.H0 <- b3$t - mean(b3$t)

and the corresponding p-value will be:

mean(abs(b3.under.H0) > abs(b3$t0))

which gives you a "very nice" value of 0.0232. :-)

Let me note that the the point "2)" mentioned above is called "translation equivariance" of the test statistic and it does NOT have to hold in general! I.e. for some test statistics, shifting of the bootstrapped "t" does not provide you with a valid estimate of the distribution of the test statistic under HO! Have a look at this discussion and especially at the reply of P. Dalgaard: http://tolstoy.newcastle.edu.au/R/e6/help/09/04/11096.html

Your testing problem does yields a perfectly symmetric distribution of the test statistic, but keep in mind that there are some problems with obtaining TWO-SIDED p-values in case of skewed bootstrapped distribution of the test statistic. Again, read the above link.

[And finally, I would use the "pure" permutation test in your situation; i.e. the second half of Elvis answer. :-)]

— jan s.
source

There are numerous ways of calculating bootstrap CIs and p-values. The main issue is that it is impossible for the bootstrap to generate data under a null hypothesis. The permutation test is a viable resampling based alternative to this. To use a proper bootstrap you must make some assumptions about the sampling distribution of the test statistic.

A comment about lack of invariance of testing: it is entirely possible to find 95% CIs not inclusive of the null yet a p > 0.05 or vice versa. To have better agreement, the calculation of bootstrap samples under the null must be dope as $\beta^*_0 = \hat{\beta} - \hat{\beta}^*$ rather than $\beta^*_0 = \hat{\beta}^* - \hat{\beta}$ . That is to say if the density is skewed right in the bootstrap sample, the density must be skewed left in the null. It is not really possible to invert tests for CIs with non-analytical (e.g. resampling) solutions such as this.

normal bootstrap

One approach is a normal bootstrap where you take the mean and standard deviation of the bootstrap distribution, calculate the sampling distribution under the null by shifting the distribution and using the normal percentiles from the null distribution at the point of the estimate in the original bootstrap sample. This is a reasonable approach when the bootstrap distribution is normal, visual inspection usually suffices here. Results using this approach are usually very close to robust, or sandwich based error estimation which is robust against heteroscedasticity and/or finite sample variance assumptions. The assumption of a normal test statistic is a stronger condition of the assumptions in the next bootstrap test I will discuss.

percentile bootstrap

Another approach is the percentile bootstrap which is what I think most of us consider when we speak of the bootstrap. Here, the bootstrapped distribution of parameter estimates an empirical distribution of the sample under the alternative hypothesis. This distribution can possibly be non-normal. A 95% CI is easily calculated by taking the empirical quantiles. But one important assumption is that such a distribution is pivotal. This means that if the underlying parameter changes, the shape of the distribution is only shifted by a constant, and the scale does not necessarily change. This is a strong assumption! If this holds, you can generate the "distribution of the statistic under the null hypothesis" (DSNH or $\mathcal{F}_0^*$ ) by subtracting the bootstrap distribution from the estimates, then calculating what percentage of the DSNH is "more extreme" than your estimate by using $2 \times \min (\mathcal{F}_0^*(\hat{\beta}), 1-\mathcal{F}_0^*(\hat{\beta}))$

Studentized bootstrap

The easiest bootstrap solution to calculating $p$ -values is to use a studentized bootstrap. With each bootstrap iteration, calculate the statistic and its standard error and return the student statistic. This gives a bootstrapped student distribution for the hypothesis which can be used to calculate cis and p-values very easily. This also underlies the intuition behind the bias-corrected-accelerated bootstrap. The t-distribution shifts much more easily under the null since outlying results are downweighted by their corresponding high variance.

Programming example

As an example, I'll use the city data in the bootstrap package. The bootstrap confidence intervals are calculated with this code:

ratio <- function(d, w) sum(d$x * w)/sum(d$u * w)
city.boot <- boot(city, ratio, R = 999, stype = "w", sim = "ordinary")
boot.ci(city.boot, conf = c(0.90, 0.95),
        type = c("norm", "basic", "perc", "bca"))

and produce this output:

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 999 bootstrap replicates

CALL : 
boot.ci(boot.out = city.boot, conf = c(0.9, 0.95), type = c("norm", 
    "basic", "perc", "bca"))

Intervals : 
Level      Normal              Basic         
90%   ( 1.111,  1.837 )   ( 1.030,  1.750 )   
95%   ( 1.042,  1.906 )   ( 0.895,  1.790 )  

Level     Percentile            BCa          
90%   ( 1.291,  2.011 )   ( 1.292,  2.023 )   
95%   ( 1.251,  2.146 )   ( 1.255,  2.155 )  
Calculations and Intervals on Original Scale

The 95% CI for the normal bootstrap is obtained by calculating:

with(city.boot, 2*t0 - mean(t) + qnorm(c(0.025, 0.975)) %o% sqrt(var(t)[1,1]))

The p-value is thus obtained:

> with(city.boot, pnorm(abs((2*t0 - mean(t) - 1) / sqrt(var(t)[1,1])), lower.tail=F)*2)
[1] 0.0315

Which agrees that the 95% normal CI does not include the null ratio value of 1.

The percentile CI is obtained (with some differences due to methods for ties):

quantile(city.boot$t, c(0.025, 0.975))

And the p-value for the percentile bootstrap is:

cvs <- quantile(city.boot$t0 - city.boot$t + 1, c(0.025, 0.975))
mean(city.boot$t > cvs[1] & city.boot$t < cvs[2])

Gives a p of 0.035 which also agrees with the confidence interval in terms of the exclusion of 1 from the value. We cannot in general observe that, while the width of the percentile CI is nearly as wide as the normal CI and that the percentile CI is further from the null that the percentile CI should provide lower p-values. This is because the shape of the sampling distribution underlying the CI for the percentile method is non-normal.

— AdamO
source

It's a very interesting answer @AdamO, but could you provide with some examples? On R, you can use function boot.ci and use argument "type" to choose a studentized CI (you can also choose a BCA CI). However, how can you calculate p-values? Are you using the estimate or the test statistic? I had a similar question which reply would be greatly appreciated.

— Kevin Zarca

+1 for clear explanation of the benefits of the studentized bootstrap.

— eric_kernfeld

@KevinOunet I gave two examples of replicating p-values from CIs in the boot package. Does this help?

— AdamO

Thank you @AdamO, that helps indeed! Could you provide a last example for studentized bootstrap?

— Kevin Zarca