具有模拟功能的重要性抽样低于预期的覆盖率

我正在尝试回答R中的重要性抽样评估方法积分问题。基本上，用户需要计算

\int_{0}^{π} f (x) d x = \int_{0}^{π} \frac{1}{\cos (x)^{2} + x^{2}} d x

$\int_{0}^{\pi}f(x)dx=\int_{0}^{\pi}\frac{1}{\cos(x)^2+x^2}dx$

使用指数分布作为重要性分布

q (x) = λ \exp^{- λ x}

$q(x)=\lambda\ \exp^{-\lambda x}$

并找到的值，该值可以更好地逼近积分（是）。我重铸问题，因为平均值的评价的超过：积分然后只是。 $\lambda$ self-study $\mu$ $f(x)$ $[0,\pi]$ $\pi\mu$

因此，让是的PDF ，并且让：现在的目标是估计 $p(x)$ $X\sim\mathcal{U}(0,\pi)$ $Y\sim f(X)$

μ = E [Y] = E [f (X)] = \int_{R} f (x) p (x) d x = \int_{0}^{π} \frac{1}{\cos (x)^{2} + x^{2}} \frac{1}{π} d x

$\mu=\mathbb{E}[Y]=\mathbb{E}[f(X)]=\int_{\mathbb{R}}f(x)p(x)dx=\int_{0}^{\pi}\frac{1}{\cos(x)^2+x^2}\frac{1}{\pi}dx$

使用重要性抽样。我在R中进行了仿真：

# clear the environment and set the seed for reproducibility
rm(list=ls())
gc()
graphics.off()
set.seed(1)

# function to be integrated
f <- function(x){
    1 / (cos(x)^2+x^2)
}

# importance sampling
importance.sampling <- function(lambda, f, B){
    x <- rexp(B, lambda) 
    f(x) / dexp(x, lambda)*dunif(x, 0, pi)
}

# mean value of f
mu.num <- integrate(f,0,pi)$value/pi

# initialize code
means  <- 0
sigmas <- 0
error  <- 0
CI.min <- 0
CI.max <- 0
CI.covers.parameter <- FALSE

# set a value for lambda: we will repeat importance sampling N times to verify
# coverage
N <- 100
lambda <- rep(20,N)

# set the sample size for importance sampling
B <- 10^4

# - estimate the mean value of f using importance sampling, N times
# - compute a confidence interval for the mean each time
# - CI.covers.parameter is set to TRUE if the estimated confidence 
#   interval contains the mean value computed by integrate, otherwise
# is set to FALSE
j <- 0
for(i in lambda){
    I <- importance.sampling(i, f, B)
    j <- j + 1
    mu <- mean(I)
    std <- sd(I)
    lower.CB <- mu - 1.96*std/sqrt(B)  
    upper.CB <- mu + 1.96*std/sqrt(B)  
    means[j] <- mu
    sigmas[j] <- std
    error[j] <- abs(mu-mu.num)
    CI.min[j] <- lower.CB
    CI.max[j] <- upper.CB
    CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB
}

# build a dataframe in case you want to have a look at the results for each run
df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter)

# so, what's the coverage?
mean(CI.covers.parameter)
# [1] 0.19

该代码基本上是重要性采样的简单实现，遵循此处使用的表示法。然后将重要性采样重复次，以获取多个估计值，并且每次检查95％的间隔是否覆盖实际均值。 $N$ $\mu$

如您所见，对于，实际覆盖率仅为0.19。将增加到这样的值无济于事（覆盖范围甚至更小，为0.15）。为什么会这样呢？ $\lambda=20$ $B$ $10^6$

r simulation exponential importance-sampling

— 三角洲IV
source

可以将无限支撑重要性函数用于有限支撑积分并不是最佳方法，因为可以将模拟的一部分用于模拟零。至少截断

处的指数，这很容易做到和模拟。

π

$\pi$

— 西安

@西安可以肯定的是，我同意，如果必须通过重要性抽样评估该积分，则不会使用该重要性分布，但是我试图回答最初的问题，该问题需要使用指数分布。我的问题是，即使此方法远非最佳，但覆盖范围仍应（平均）随着

。这就是Greenparker的表现。

B \to \infty

$B\to\infty$

— DeltaIV

重要性抽样对重要性分布的选择非常敏感。由于选择了时，样品所绘制使用将有一个平均的与方差。这就是你得到的分布 $\lambda = 20$ rexp $1/20$ $1/400$

但是，要评估的积分从0到。因此，您想使用一个给您这样范围的。我用。 $\pi =3.14$ $\lambda$ $\lambda = 1$

$\lambda = 1$ $\pi$ $\pi$ $\lambda = 1$

# clear the environment and set the seed for reproducibility
rm(list=ls())
gc()
graphics.off()
set.seed(1)

# function to be integrated
f <- function(x){
  1 / (cos(x)^2+x^2)
}

# importance sampling
importance.sampling <- function(lambda, f, B){
  x <- rexp(B, lambda) 
  f(x) / dexp(x, lambda)*dunif(x, 0, pi)
}

# mean value of f
mu.num <- integrate(f,0,pi)$value/pi

# initialize code
means  <- 0
sigmas <- 0
error  <- 0
CI.min <- 0
CI.max <- 0
CI.covers.parameter <- FALSE

# set a value for lambda: we will repeat importance sampling N times to verify
# coverage
N <- 100
lambda <- rep(1,N)

# set the sample size for importance sampling
B <- 10^4

# - estimate the mean value of f using importance sampling, N times
# - compute a confidence interval for the mean each time
# - CI.covers.parameter is set to TRUE if the estimated confidence 
#   interval contains the mean value computed by integrate, otherwise
# is set to FALSE
j <- 0
for(i in lambda){
  I <- importance.sampling(i, f, B)
  j <- j + 1
  mu <- mean(I)
  std <- sd(I)
  lower.CB <- mu - 1.96*std/sqrt(B)  
  upper.CB <- mu + 1.96*std/sqrt(B)  
  means[j] <- mu
  sigmas[j] <- std
  error[j] <- abs(mu-mu.num)
  CI.min[j] <- lower.CB
  CI.max[j] <- upper.CB
  CI.covers.parameter[j] <- lower.CB < mu.num & mu.num < upper.CB
}

# build a dataframe in case you want to have a look at the results for each run
df <- data.frame(lambda, means, sigmas, error, CI.min, CI.max, CI.covers.parameter)

# so, what's the coverage?
mean(CI.covers.parameter)
#[1] .95

$\lambda$

编辑 - - - -

$B = 10^4$ $B = 10^6$ $N = 100$ $B = 10^4$

.19 \pm 1.96 * \sqrt{\frac{.19 * （ 1个 - .19 ）}{100}} = .19 \pm .0769 = （ .1131 ， .2669 ） 。

$.19 \pm 1.96*\sqrt{\dfrac{.19*(1-.19)}{100}} = .19 \pm .0769 = (.1131, .2669)\,.$

$B = 10^6$

$N = 100$ $N = 1000$ $B = 10^4$ $B = 10^6$ $.158$

.123 \pm 1.96 \sqrt{\frac{.123 * （ 1个 - .123 ）}{1000}} = .123 \pm .0203 = （ .102 ， .143 ） 。

$.123 \pm 1.96\sqrt{\dfrac{.123*(1 - .123)}{1000}} = .123 \pm .0203 = (.102, .143)\,.$

$N = 1000$

— 格林帕克
source

λ

$\lambda$

0.1 < λ < 2

$0.1<\lambda<2$

λ

$\lambda$

λ = 20

$\lambda =20$

10^{4}

$10^4$

10^{6}

$10^6$

λ

$\lambda$

— DeltaIV '17

N = 100

$N = 100$

N = 1000

$N=1000$