MCMC算法中的错误示例

我正在研究一种自动检查Markov链蒙特卡洛方法的方法，并且我想举一些在构造或实现此类算法时可能发生的错误的示例。如果发表的论文使用了错误的方法，则加分。

我对错误表示链具有不正确的不变分布的情况特别感兴趣，尽管也会考虑其他类型的错误（例如链不是遍历）。

当Metropolis-Hastings拒绝提议的举动时，此类错误的示例将无法输出值。

mcmc

— 西蒙·伯恩
source

我最喜欢的示例之一是谐波均值估计器，因为它具有良好的渐近特性，但在实践中无法正常工作。拉德福德·尼尔（Radford Neal）在他的博客中对此进行了讨论：“坏消息是，这个估算器接近正确答案所需的点数通常会大于可观察到的宇宙中的原子数”。此方法已在应用程序中广泛实现。

另一个由尼尔教授提供。

— 青色

@Cyan为了让Neal受到重视，我认为他应该找到可以接受他的文章的日记，而不是仅仅在互联网上提交。我可以轻易相信他是正确的，而裁判和作者是不正确的。尽管很难发表与已发表的结果相矛盾的论文，而且JASA的拒绝令人沮丧，但我认为他应该尝试其他几种期刊，直到获得成功为止。您需要部分和独立的裁判员来增加您的调查结果的可信度。

— Michael R. Chernick 2012年

人们应该始终认真对待尼尔教授！; o）真是太遗憾了，这样的结果很难发布，而且不幸的是，现代学术文化似乎不重视这种事情，因此，如果这不是他的优先事项，这是可以理解的。有趣的问题，我对答案非常感兴趣。

— 迪克兰有袋动物博物馆，2012年

@迈克尔：也许。我曾在很多情况下处于类似情况的各个方面，包括尼尔教授的立场，我的轶事观察结果是，与大多数接受情况一样，拒纸在大多数情况下仅包含很少的信息内容。同行评审的噪音要比人们关心的要高出几个数量级，而且，在这种情况下，通常会有部分和感兴趣的（即，不是独立的）政党和利益在起作用。就是说，我无意让我的原始评论将我们带到目前的话题。感谢您分享您对此事的想法。

— 主教

Answers:

1.边际可能性和调和均值估计器

所述边缘似然被定义为后验分布的归一化常数

p (x) = \int_{Θ} p (x | θ) p (θ) d θ .

$p({\bf x})=\int_{\Theta}p({\bf x}\vert\theta)p(\theta)d\theta.$

这个数量的重要性来自它通过贝叶斯因素在模型比较中所扮演的角色。

已经提出了几种方法来近似该量。Raftery等。（2007年）提出了谐波均值估计器，由于其简单性，它很快流行起来。这个想法包括使用关系

\frac{1}{p (x)} = \int_{Θ} \frac{p (θ | x)}{p (x | θ)} d θ .

$\dfrac{1}{p({\bf x})}=\int_{\Theta}\dfrac{p(\theta\vert{\bf x})}{p({\bf x}\vert\theta)}d\theta.$

因此，如果我们从后部有一个样品，说，这个量可以通过近似 $(\theta_1,...,\theta_N)$

\frac{1}{p (x)} \approx \frac{1}{N} \sum_{j = 1}^{N} \frac{1}{p (x | θ_{j})} .

$\dfrac{1}{p({\bf x})}\approx\dfrac{1}{N}\sum_{j=1}^N \dfrac{1}{p({\bf x}\vert\theta_j)}.$

这种近似与重要性采样的概念有关。

根据Neal 博客中讨论的大数定律，我们认为该估计量是一致的。问题在于，良好近似所需的可能很大。见尼尔的博客或罗伯特的博客1，2，3，4的一些例子。 $N$

备择方案

逼近有许多选择。肖邦和罗伯特（2008）提出了一些基于重要性抽样的方法。 $p({\bf x})$

2.没有足够长的时间运行MCMC采样器（特别是在存在多模式的情况下）

Mendoza和Gutierrez-Peña（1999）推导了两个正态平均值之比的参考先验/后验，并给出了使用真实数据集使用此模型获得的推论的示例。使用MCMC方法，他们获得均值的后验大小为的样本，如下所示 $2000$ $\varphi$

enter image description here

和获得用于HPD间隔。在分析后验分布的表达式后，很容易看到它在处具有奇异性，并且后验实际上应该看起来像这样（请注意在处的奇异性） $\varphi$ $(0.63,5.29)$ $0$ $0$

enter image description here

仅当您运行足够长的MCMC采样器或使用自适应方法时才能检测到。用这些方法之一中获得的HPD是如已经报道。HPD间隔的长度显着增加，这与将其与常用/经典方法相比时具有重要意义。 $(0,7.25)$

3. Gelman，Carlin和Neal 在本次讨论中还发现了一些其他问题，例如评估收敛性，初始值的选择，链条的不良行为。

4.重要抽样

$g$

I = \int f (x) d x = \int \frac{f (x)}{g (x)} g (x) d x .

$I=\int f(x)dx = \int \dfrac{f(x)}{g(x)}g(x)dx.$

Then, if we have a sample from $g$ , $(x_1,...,x_N)$ , we can approximate $I$ as follows

I \approx \frac{1}{N} \sum_{j = 1}^{N} \frac{f (x_{j})}{g (x_{j})} .

$I\approx \dfrac{1}{N}\sum_{j=1}^N \dfrac{f(x_j)}{g(x_j)}.$

A possible issue is that $g$ should have tails heavier/similar than/to $f$ or the required $N$ for a good approximation could be huge. See the following toy example in R.

# Integrating a Student's t with 1 d.f. using a normal importance function   
x1 = rnorm(10000000)   # N=10,000,000
mean(dt(x1,df=1)/dnorm(x1))

# Now using a Student's t with 2 d.f. function
x2 = rt(1000,df=2)
mean(dt(x2,df=1)/dt(x2,df=2))

They're some great examples. For anyone who is interested, the letter to the editor with the figure is here: onlinelibrary.wiley.com/doi/10.1002/bimj.200800256/abstract

— Simon Byrne

Very nice and clear summary!! (+1)

— gui11aume

Darren Wilkinson on his blog gives a detailed example of a common mistake in random walk Metropolis-Hastings. I recommend reading it in full, but here is the tl;dr version.

If the target distribution is positive (like Gamma distributions etc) in one dimension, it is tempting to reject proposals that have a negative value on that dimension straight away. The mistake is to throw away the proposals like they never happened and evaluate the Metropolis-Hastings (MH) acceptance ratio of the other ones only. This is a mistake because it amounts to using a non symmetric proposal density.

The author suggests to apply one of two fixes.

Count the "negatives" as failing acceptance (and lose a bit of efficiency).
Use the correct MH ratio in that case, which is

\frac{π (x^{*})}{π (x)} \frac{Φ (x)}{Φ (x^{*})},

$\frac{\pi(x^*)}{\pi(x)} \frac{\Phi(x)}{\Phi(x^*)},$

where $\pi$ is the target density and $\Phi$ is the normalization constant of the truncated random walk proposal $\phi$ , i.e. $\Phi(x) = \int_0^{\infty} \phi(y-x)dy$ .

— gui11aume
source

+1 Interesting example. I was also thinking about other issues with MH related to the acceptance rate. I think the 0.234 optimal rate has been overused.

@Procrastinator you know the MCMC literature very well. Is this your domain of expertise?

— gui11aume

Thanks for your comment. I like Bayesian statistics, then I need to carry the MCMC cross ;).

A very clear case (connected with the marginal likelihood approximation mentioned in the first answer) where true convergence is the example of the problem of label switching in mixture models coupled with the use of Chib's (1995) estimator. As pointed out by Radford Neal (1999), if the MCMC chain does not converge correctly, in the sense that it does explore some of the mode of the target distribution, the Monte Carlo approximation of Chib fails to reach the right numerical value.

— Xi'an
source