如何从非参数估计分布中抽取随机样本？

我有100个连续的一维样本。我使用核方法估计了其非参数密度。如何从此估计分布中抽取随机样本？

r sampling kernel-smoothing

— Lovekesh
source

籽粒密度估计是混合物的分布。对于每个观察，都有一个内核。如果内核是按比例缩放的密度，则可以使用一种简单的算法从内核密度估计中进行采样：

repeat nsim times:
  sample (with replacement) a random observation from the data
  sample from the kernel, and add the previously sampled random observation

$h$ $x_i$ $N(\mu = x_i, \sigma = h)$

# Original distribution is exp(rate = 5)
N = 1000
x <- rexp(N, rate = 5)

hist(x, prob = TRUE)
lines(density(x))

# Store the bandwith of the estimated KDE
bw <- density(x)$bw

# Draw from the sample and then from the kernel
means <- sample(x, N, replace = TRUE)
hist(rnorm(N, mean = means, sd = bw), prob = TRUE)

$M$

M = 10
hist(rnorm(N * M, mean = x, sd = bw))

如果由于某种原因您无法从内核中提取内容（例如，内核不是密度），则可以尝试使用重要性采样或MCMC。例如，使用重要性抽样：

# Draw from proposal distribution which is normal(mu, sd = 1)
sam <- rnorm(N, mean(x), 1)

# Weight the sample using ratio of target and proposal densities
w <- sapply(sam, function(input) sum(dnorm(input, mean = x, sd = bw)) / 
                                 dnorm(input, mean(x), 1))

# Resample according to the weights to obtain an un-weighted sample
finalSample <- sample(sam, N, replace = TRUE, prob = w)

hist(finalSample, prob = TRUE)

谢谢我对Glen_b做出的贡献表示感谢。

— 马泰奥·法西奥（Matteo Fasiolo）
source

抱歉，我直接进行重要性采样，然后我意识到通常采样比这更简单。我在回答中加入了您的初步解释。非常感谢

— Matteo Fasiolo 2014年

@ Matteo Fasiolo-您对我可以引用此方法的论文有任何参考吗？

— 帕拉维