从贝叶斯概率角度来看,为什么95%的置信区间不包含具有95%概率的真实参数?


14

从Wikipedia页面上的置信区间

...如果在重复(可能不同)实验的许多单独数据分析中构建置信区间,则包含参数真实值的此类区间的比例将与置信度匹配...

并在同一页面上:

置信区间不能预测给定实际获得的数据,参数的真实值具有置信区间内的特定概率。

如果我理解正确的话,那么最后的陈述是考虑到概率论的频繁性解释。但是,从贝叶斯概率角度来看,为什么95%的置信区间不包含具有95%概率的真实参数?如果不是,则以下推理出了什么问题?

如果我知道某个过程在95%的时间内都能给出正确的答案,则下一个答案正确的可能性为0.95(假设我没有有关该过程的任何额外信息)。同样,如果有人向我展示了由某个过程创建的置信区间,该过程将在95%的时间内包含真实参数,那么根据我所知,我是否应该说它包含0.95概率的真实参数?

这个问题类似于但不相同,为什么95%CI并不意味着95%的机会包含均值?这个问题的答案一直集中在为什么从经常性的角度来看,95%CI并不意味着95%的机会包含均值。我的问题是相同的,但是从贝叶斯概率角度来看。


认为这是95%CI的一种方法是“长期平均值”。现在,有很多方法可以拆分“短期”案例,以便获得相当任意的覆盖率-但平均下来,总体覆盖率为95%。另一个更抽象的方式被生成xiBernoulli(pi)i=1,2,,使得i=1pi=0.95。您可以通过多种方式来执行此操作。这里xi指示使用第ith个数据集创建的CI是否包含参数,并且pi是这种情况下的覆盖概率。
概率

Answers:


11

更新:经过几年的事后观察,我提出了一种更为简洁的处理方法回答了一个类似问题,对基本上相同的材料进行。


如何建立一个信任区

让我们从构造置信区域的一般方法开始。可以将其应用于单个参数,以产生置信区间或区间集。可以将其应用于两个或多个参数,以产生更高维的置信区域。

我们断言,观察到的统计量D源自具有参数θ的分布,即在可能的统计量d上的采样分布s(d|θ),并在可能值θ的集合中寻找θ的置信区域。定义最高密度区域(HDR):PDF 的h - HDR 是其域中支持概率h的最小子集。表示ħ的-HDR 小号d | ψ ħ ψ,对于任何ψdθΘhhhs(d|ψ)HψψΘ。然后,h为置信区域θ给定的数据,D,是集CD={ϕ:DHϕ}h的典型值为0.95。

惯常解释

从置信区域的前述定义如下

dHψψCd
Cd={ϕ:dHϕ}。现在想象一下在与D类似的情况下拍摄的大量(假想)观测值{Di}。即它们是来自s d | θ )的样本。由于ħ θ支撑概率质量ħ PDF的小号Ds(d|θ)Hθhs(d|θ)P(DiHθ)=h对所有i。因此,的分数{Di}为其中DiHθh。因此,使用上面的等价性,的分数{Di}针对θCDih

那么,这就是常客对θh置信区域的要求等于:θ

取大量假想的观察{Di}从采样分布s(d|θ)的是引起了所观察到的统计D。然后,θ位于类似但虚构的置信区域{ C D i }的分数h之内。{CDi}

因此,置信区域CDθ在某处的概率没有任何要求!原因很简单,公式中没有什么可以让我们说出θ上的概率分布。解释只是精心设计的上层建筑,并不能改善基础。基仅s(d|θ)D,其中θ不会作为分布量出现,并且没有可用于解决该问题的信息。基本上有两种获得θ分布的方法:

  1. 直接根据现有信息分配分布:p(θ|I)
  2. 涉及θ到另一个分布式量:p(θ|I)=p(θx|I)dx=p(θ|xI)p(x|I)dx

在两种情况下,θ必须出现在左侧某处。经常使用的人不能使用这两种方法,因为它们都需要异端先验。

贝叶斯观

贝叶斯可以使最h置信区域CD无条件给予,简直就是直接解释:这是一组ϕ为其中D落在h -HDR Hϕ抽样分布的s(d|ϕ)。它不一定告诉我们有关θ太多信息,这就是原因。

的概率θCD,给出D和背景信息I是:

P(θCD|DI)=CDp(θ|DI)dθ=CDp(D|θI)p(θ|I)p(D|I)dθ
注意的是,不同于频率论的解释,我们立即要求在分布θ。背景信息I告诉我们,如前所述,该采样分布是s(d|θ)
P(θCD|DI)=CDs(D|θ)p(θ|I)p(D|I)dθ=CDs(D|θ)p(θ|I)dθp(D|I)i.e.P(θCD|DI)=CDs(D|θ)p(θ|I)dθs(D|θ)p(θ|I)dθ
现在这个表达不一般评估为h,这是说,h置信区域CD不总是包含θ的概率h。实际上,它可能与h完全不同。但是,在许多常见情况下,它的确会评估为h,这就是为什么置信区域通常与我们的概率直觉一致的原因。

例如,假设dθ的先验联合PDF 是对称的,其中pd,θ(d,θ|I)=pd,θ(θ,d|I)。(显然,这涉及PDF在dθ的同一范围内的假设。)然后,如果先验值为p(θ|I)=f(θ),则s(D|θ)p(θ|I)=s(D|θ)f(θ)=s(θ|D)f(D). Hence

P(θCD|DI)=CDs(θ|D)dθs(θ|D)dθi.e.P(θCD|DI)=CDs(θ|D)dθ
From the definition of an HDR we know that for any ψΘ
Hψs(d|ψ)dd=hand therefore thatHDs(d|D)dd=hor equivalentlyHDs(θ|D)dθ=h
Therefore, given that s(d|θ)f(θ)=s(θ|d)f(d), CD=HD implies P(θCD|DI)=h. The antecedent satisfies
CD=HDψ[ψCDψHD]
Applying the equivalence near the top:
CD=HDψ[DHψψHD]
Thus, the confidence region CD contains θ with probability h if for all possible values ψ of θ, the h-HDR of s(d|ψ) contains D if and only if the h-HDR of s(d|D) contains ψ.

Now the symmetric relation DHψψHD is satisfied for all ψ when s(ψ+δ|ψ)=s(Dδ|D) for all δ that span the support of s(d|D) and s(d|ψ). We can therefore form the following argument:

  1. s(d|θ)f(θ)=s(θ|d)f(d) (premise)
  2. ψδ[s(ψ+δ|ψ)=s(Dδ|D)] (premise)
  3. ψδ[s(ψ+δ|ψ)=s(Dδ|D)]ψ[DHψψHD]
  4. ψ[DHψψHD]
  5. ψ[DHψψHD]CD=HD
  6. CD=HD
  7. [s(d|θ)f(θ)=s(θ|d)f(d)CD=HD]P(θCD|DI)=h
  8. P(θCD|DI)=h

Let's apply the argument to a confidence interval on the mean of a 1-D normal distribution (μ,σ), given a sample mean x¯ from n measurements. We have θ=μ and d=x¯, so that the sampling distribution is

s(d|θ)=nσ2πen2σ2(dθ)2
Suppose also that we know nothing about θ before taking the data (except that it's a location parameter) and therefore assign a uniform prior: f(θ)=k. Clearly we now have s(d|θ)f(θ)=s(θ|d)f(d), so the first premise is satisfied. Let s(d|θ)=g((dθ)2). (i.e. It can be written in that form.) Then
s(ψ+δ|ψ)=g((ψ+δψ)2)=g(δ2)ands(Dδ|D)=g((DδD)2)=g(δ2)so thatψδ[s(ψ+δ|ψ)=s(Dδ|D)]
whereupon the second premise is satisfied. Both premises being true, the eight-point argument leads us to conclude that the probability that θ lies in the confidence interval CD is h!

We therefore have an amusing irony:

  1. The frequentist who assigns the h confidence interval cannot say that P(θCD)=h, no matter how innocently uniform θ looks before incorporating the data.
  2. The Bayesian who would not assign an h confidence interval in that way knows anyhow that P(θCD|DI)=h.

Final Remarks

We have identified conditions (i.e. the two premises) under which the h confidence region does indeed yield probability h that θCD. A frequentist will baulk at the first premise, because it involves a prior on θ, and this sort of deal-breaker is inescapable on the route to a probability. But for a Bayesian, it is acceptable---nay, essential. These conditions are sufficient but not necessary, so there are many other circumstances under which the Bayesian P(θCD|DI) equals h. Equally though, there are many circumstances in which P(θCD|DI)h, especially when the prior information is significant.

We have applied a Bayesian analysis just as a consistent Bayesian would, given the information at hand, including statistics D. But a Bayesian, if he possibly can, will apply his methods to the raw measurements instead---to the {xi}, rather than x¯. Oftentimes, collapsing the raw data into summary statistics D destroys information in the data; and then the summary statistics are incapable of speaking as eloquently as the original data about the parameters θ.


Would it be correct to say that a Bayesian is committed to take all the available information into account, while interpretation given in the question ignored D in some sense?
qbolec

Is it a good mental picture to illustrate the situation: imagine a grayscale image, where intensity of pixel x,y is the joint ppb of real param being y and observed stat being x. In each row y, we mark pixels which have 95% mass of the row. For each observed stat x, we define CI(x) to be the set of rows which have marked pixels in column x. Now, if we choose x,y randomly then CI(x) will contain y iff x,y was marked, and mass of marked pixels is 95% for each y. So, frequentists say that keeping y fixed, chance is 95%, OP says, that not fixing y also gives 95%, and bayesians fix y and don't know
qbolec

@qbolec It is correct to say that in the Bayesian method one cannot arbitrarily ignore some information while taking account of the rest. Frequentists say that for all y the expectation of yCI(x) (as a Boolean integer) under the sampling distribution prob(x|y,I) is 0.95. The frequentist 0.95 is not a probability but an expectation.
CarbonFlambe--Reinstate Monica

6

from a Bayesian probability perspective, why doesn't a 95% confidence interval contain the true parameter with 95% probability?

Two answers to this, the first being less helpful than the second

  1. There are no confidence intervals in Bayesian statistics, so the question doesn't pertain.

  2. In Bayesian statistics, there are however credible intervals, which play a similar role to confidence intervals. If you view priors and posteriors in Bayesian statistics as quantifying the reasonable belief that a parameter takes on certain values, then the answer to your question is yes, a 95% credible interval represents an interval within which a parameter is believed to lie with 95% probability.

If I have a process that I know produces a correct answer 95% of the time then the probability of the next answer being correct is 0.95 (given that I don't have any extra information regarding the process).

yes, the process guesses a right answer with 95% probability

Similarly if someone shows me a confidence interval that is created by a process that will contain the true parameter 95% of the time, should I not be right in saying that it contains the true parameter with 0.95 probability, given what I know?

Just the same as your process, the confidence interval guesses the correct answer with 95% probability. We're back in the world of classical statistics here: before you gather the data you can say there's a 95% probability of randomly gathered data determining the bounds of the confidence interval such that the mean is within the bounds.

With your process, after you've gotten your answer, you can't say based on whatever your guess was, that the true answer is the same as your guess with 95% probability. The guess is either right or wrong.

And just the same as your process, in the confidence interval case, after you've gotten the data and have an actual lower and upper bound, the mean is either within those bounds or it isn't, i.e. the chance of the mean being within those particular bounds is either 1 or 0. (Having skimmed the question you refer to it seems this is covered in much more detail there.)

How to interpret a confidence interval given to you if you subscribe to a Bayesian view of probability.

There are a couple of ways of looking at this

  1. Technically, the confidence interval hasn't been produced using a prior and Bayes theorem, so if you had a prior belief about the parameter concerned, there would be no way you could interpret the confidence interval in the Bayesian framework.

  2. Another widely used and respected interpretation of confidence intervals is that they provide a "plausible range" of values for the parameter (see, e.g., here). This de-emphasises the "repeated experiments" interpretation.

Moreover, under certain circumstances, notably when the prior is uninformative (doesn't tell you anything, e.g. flat), confidence intervals can produce exactly the same interval as a credible interval. In these circumstances, as a Bayesianist you could argue that had you taken the Bayesian route you would have gotten exactly the same results and you could interpret the confidence interval in the same way as a credible interval.


but for sure confidence intervals exist even if I subscribe to a bayesian view of probability, they just wont dissapear, right? :)The situation I was asking about was how to interpret a confidence interval given to you if you subscribe to a Bayesian view of probability.
Rasmus Bååth

The problem is that confidence intervals aren't produced using a Bayesian methodology. You don't start with a prior. I'll edit the post to add something which might help.
TooTone

2

I'll give you an extreme example where they are different.

Suppose I create my 95% confidence interval for a parameter θ as follows. Start by sampling the data. Then generate a random number between 0 and 1. Call this number u. If u is less than 0.95 then return the interval (,). Otherwise return the "null" interval.

Now over continued repititions, 95% of the CIs will be "all numbers" and hence contain the true value. The other 5% contain no values, hence have zero coverage. Overall, this is a useless, but technically correct 95% CI.

The Bayesian credible interval will be either 100% or 0%. Not 95%.


So is it correct to say that before seeing a confidence interval there is a 95% probability that it will contain the true parameter, but for any given confidence interval the probability that it covers the true parameter depends on the data (and our prior)? To be honest, what I'm really struggling with is how useless confidence intervals sounds (credible intervals I like on the other hand) and the fact that I never the less will have to teach them to our students next week... :/
Rasmus Bååth

This question has some more examples, plus a very good paper comparing the two approaches
probabilityislogic

1

"from a Bayesian probability perspective, why doesn't a 95% confidence interval contain the true parameter with 95% probability? "

In Bayesian Statistics the parameter is not a unknown value, it is a Distribution. There is no interval containing the "true value", for a Bayesian point of view it does not even make sense. The parameter it's a random variable, you can perfectly know the probability of that value to be between x_inf an x_max if you know the distribuition. It's just a diferent mindset about the parameters, usually Bayesians used the median or average value of the distribuition of the parameter as a "estimate". There is not a confidence interval in Bayesian Statistics, something similar is called credibility interval.

Now from a frequencist point of view, the parameter is a "Fixed Value", not a random variable, can you really obtain probability interval (a 95% one) ? Remember that it's a fixed value not a random variable with a known distribution. Thats why you past the text :"A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data actually obtained."

The idea of repeating the experience over and over... is not Bayesian reasoning it's a Frequencist one. Imagine a real live experiment that you can only do once in your life time, can you/should you built that confidence interval (from the classical point of view )?.

But... in real life the results could get pretty close ( Bayesian vs Frequencist), maybe thats why It could be confusing.

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.