结合不同来源的概率/信息

26

可以说我有三个独立的消息源，每个消息源都对明天的天气做出了预测。第一个说明天下雨的概率是0，第二个说明天下雨的概率是1，最后一个说明天下雨的概率是50％。我想知道给出该信息的总概率。

如果将乘法定理应用于独立事件，我将得到0，这似乎是不正确的。如果所有来源都是独立的，为什么不能将这三个数相乘？当我获得新信息时，是否有贝叶斯方法来更新先验信息？

注意：这不是作业，是我一直在考虑的事情。

— 比耶拉·迪埃拉（Biela Diela）
source

1

你知道的独立消息来源的可靠性如何是

— 迪利普Sarwate

不，我可以假设所有来源都是一样可靠的。

— Biela Diela

3

我也在考虑这个问题。我要添加第二个问题：如果所有预测均为0.75，那么综合概率为多少？高于0.75？分析此类问题的正式框架是什么？

— Karsten W.

2

确实没有足够的信息。我们需要一些模型来预测预测如何与现实相关。

— Glen_b-恢复莫妮卡

当来源提供有关概率或置信度/信任度的陈述时，我不太确定“所有来源均可靠”的含义。如果我们谈论的是某个概率具有给定值的概率，这似乎带来了概念上的问题。顺便说一句，如果源1和源2同样可靠，则它们都必须正确，概率为0.50 ...（下雨的概率为1/2）。

— AG

32

您问三件事：（a）如何组合多个预测以获得单个预测；（b）是否可以在此处使用贝叶斯方法；以及（c）如何处理零概率。

结合预测是一种常见的做法。如果您有多个预测，而不是对这些预测取平均值，那么就准确性而言，组合后的预测应该比任何单个预测都更好。要对它们进行平均，可以使用加权平均，其中权重基于反误差（即精度）或信息内容。如果您了解每个来源的可靠性，则可以分配与每个来源的可靠性成比例的权重，因此，更可靠的来源会对最终的组合预测产生更大的影响。在您的情况下，您对其可靠性没有任何了解，因此每个预测都具有相同的权重，因此您可以使用三个预测的简单算术平均值

0 ％ \times .33 + 50 ％ \times .33 + 100 ％ \times .33 = （ 0 ％ + 50 ％ + 100 ％ ） / 3 = 50 ％

$0\%\times.33+50\%\times.33+100\%\times.33 = (0\%+50\%+100\%)/3=50\%$

正如@AndyW和@ArthurB在评论中所建议的。，除了简单的加权均值以外，其他方法也可用。关于平均专家预测的文献中描述了许多这样的方法，而我以前并不熟悉这些方法，所以谢谢大家。在平均专家预测时，有时我们想纠正以下事实：专家倾向于回归到均值（Baron等，2013），或者使他们的预测更加极端（Ariely等，2000； Erev等，1994）。要实现这一目标，可以使用单个预测，例如logit函数 $p_i$

\begin{matrix} (1) & l o g i t (p_{i}) = \log (\frac{p_{i}}{1 - p_{i}}) \end{matrix}

$\mathrm{logit}(p_i) = \log\left( \frac{p_i}{1-p_i} \right) \tag{1}$

幂的几率 $a$

\begin{matrix} (2) & g (p_{i}) = {(\frac{p_{i}}{1 - p_{i}})}^{a} \end{matrix}

$g(p_i) = \left( \frac{p_i}{1-p_i} \right)^a \tag{2}$

其中，或形式的更一般的变换 $0 < a < 1$

\begin{matrix} （3） & t （ p_{一世} ） = \frac{p_{一世}^{一种}}{p_{一世}^{一种} + （ 1个 - p_{一世} ）^{一种}} \end{matrix}

$t(p_i) = \frac{p_i^a}{p_i^a + (1-p_i)^a} \tag{3}$

其中如果没有施加变换，如果个体预测是更加极端，如果预测是由不那么极端，什么显示在画面的下方（参见卡马卡尔，1978; Baron等，2013 ）。 $a=1$ $a>1$ $0 < a<1$

在进行此类转换后，对预测值进行平均（使用算术平均值，中位数，加权平均值或其他方法）。如果使用方程式（1）或（2），则需要使用（1）的反对数和（2）的反几率对结果进行反变换。另外，也可以使用几何平均值（参见Genest和Zidek，1986；参见Dietrich and List，2014）

\begin{matrix} （4） & \hat{p} = \frac{\prod_{一世 = 1个}^{ñ} p_{一世}^{w_{一世}}}{\prod_{一世 = 1个}^{ñ} p_{一世}^{w_{一世}} + \prod_{一世 = 1个}^{ñ} （ 1个 - p_{一世} ）^{w_{一世}}} \end{matrix}

$\hat p = \frac{ \prod_{i=1}^N p_i^{w_i} }{ \prod_{i=1}^N p_i^{w_i} + \prod_{i=1}^N (1 - p_i)^{w_i} } \tag{4}$

或Satopää等人（2014）提出的方法

\begin{matrix} （5） & \hat{p} = \frac{{[\prod_{一世 = 1个}^{ñ} {（ \frac{p_{一世}}{1个 - p_{一世}} ）}^{w_{一世}}]}^{一种}}{1个 + {[\prod_{一世 = 1个}^{ñ} {（ \frac{p_{一世}}{1个 - p_{一世}} ）}^{w_{一世}}]}^{一种}} \end{matrix}

$\hat p = \frac{ \left[ \prod_{i=1}^N \left(\frac{p_i}{1-p_i} \right)^{w_i} \right]^a }{ 1 + \left[ \prod_{i=1}^N \left(\frac{p_i}{1-p_i} \right)^{w_i} \right]^a } \tag{5}$

其中是权重。在大多数情况下，除非存在暗示其他选择的先验信息，否则将使用相等的权重此类方法用于对专家预测进行平均，以纠正置信度过低或过高的情况。在其他情况下，您应该考虑将预测转换为更多或更少极端值是否合理，因为这会使最终的总估计超出最低和最高单个预测所标记的边界。 $w_i$ $w_i = 1/N$

如果您对降雨概率有先验知识，则可以使用贝叶斯定理以给定降雨先验概率的方式更新预报，方法与此处所述类似。还有一种简单的方法可以应用，即计算您的预测的加权平均值（如上所述），其中先验概率 $p_i$ $\pi$ 被视为附加的数据点与一些预先规定的重量如本IMDB例（也参见源，或在这里和这里进行讨论；请参阅Genest和Schervish，1985年），即 $w_{\pi}$

\begin{matrix} (6) & \hat{p} = \frac{(\sum_{i = 1}^{N} p_{i} w_{i}) + π w_{π}}{(\sum_{i = 1}^{N} w_{i}) + w_{π}} \end{matrix}

$\hat p = \frac{ \left(\sum_{i=1}^N p_i w_i \right) + \pi w_{\pi} }{ \left(\sum_{i=1}^N w_i \right) + w_{\pi} } \tag{6}$

但是从您的问题中并不能得出您对问题有任何先验知识，因此您可能会使用统一的先验知识，即假设先验概率 下雨的可能性为，在您提供的示例中，这实际上并没有太大变化。 $50\%$

为了处理零，有几种可能的方法。首先，您应该注意到的降雨机会并不是真正可靠的值，因为它表示不可能下雨。在自然语言处理中，当您在数据中未观察到可能会出现的某些值时，通常也会发生类似的问题（例如，您计算字母的频率，而在数据中根本不会出现一些不常见的字母）。在这种情况下，经典的概率估计器，即 $0\%$

p_{一世} = \frac{ñ_{一世}}{\sum_{一世} ñ_{一世}}

$p_i = \frac{n_i}{\sum_i n_i}$

其中是一个数字的出现个值（在类别），给你如果。这称为零频问题。对于此类值，您知道它们的概率不为零（它们存在！），因此此估计显然是不正确的。还有一个实际的问题：用零相乘和除会导致零或不确定的结果，因此零在处理上是有问题的。 $n_i$ $i$ $d$ $p_i = 0$ $n_i = 0$

最简单且常用的解决方法是在计数中添加一些常数，以便 $\beta$

p_{一世} = \frac{ñ_{一世} + β}{（ \sum_{一世} ñ_{一世} ） + d β}

$p_i = \frac{n_i + \beta}{(\sum_i n_i) + d\beta}$

对于共同选择是，即，施加之前基于均匀连续的拉普拉斯规则，为克里切夫斯基-Trofimov估计，或为舒尔曼-的Grassberger（1996）估计器。但是请注意，此处要做的是在模型中应用数据外（先验）信息，因此它具有主观贝叶斯风格。使用这种方法时，您必须记住自己所做的假设，并将其考虑在内。我们有很强的先验事实 $\beta$ $1$ $1/2$ $1/d$ 知道我们的数据中不应该存在零概率的知识直接证明了这里的贝叶斯方法是正确的。在您的情况下，您没有频率而是有概率，因此您将添加一些非常小的值以校正零。但是请注意，在某些情况下，此方法可能会带来不良后果（例如，在处理日志时），因此应谨慎使用。

Schurmann，T。和P. Grassberger。（1996）。符号序列的熵估计。 混沌， 6，41-427。

Ariely，D.，Tung Au，W.，Bender，RH，Budescu，DV，Dietz，CB，Gu，H.，Wallsten，TS和Zauberman，G.（2000）。在法官之间和法官内部平均主观概率估计的影响。 实验心理学杂志：应用，6（2），130。

J. Baron，Mellers，文学士，Tetlock，PE，Stone，E。和Ungar，LH（2014）。使汇总概率预测更加极端的两个原因。决策分析，11（2），133-145。

埃里夫（Irev），沃兹滕（TS）和布德斯库（DV）（1994）。过度自信和自信不足：错误在判断过程中的作用。 心理评论，101（3），519。

美国Karmarkar（1978年）。主观加权效用：预期效用模型的描述性扩展。组织行为与人类绩效，21（1），61-72。

特纳（BM），史蒂夫（Steyvers），M.Merkle，EC，布德斯库（DV）和沃斯滕（TS）（2014）。通过重新校准预测聚合。 机器学习，95（3），261-289。

Genest，C.和Zidek，JV（1986）。组合概率分布：评论和带注释的书目。 统计科学，1，114-135。

弗吉尼亚州的萨托帕（Satopää），约翰·巴伦（J. 使用简单的logit模型组合多个概率预测。国际预测杂志，30（2），344-356。

Genest，C.和Schervish，MJ（1985）。 Modeling expert judgments for Bayesian updating. The Annals of Statistics, 1198-1212.

Dietrich，F.和List，C.（2014）。概率意见汇总。（未出版）

— 提姆
source

2

我想添加到此而不是开始一个新的答案。另一种众所周知的方法是通过取其几何平均值（而不是算术平均值）来组合三个（或N个）概率。Hinton指出，这给了一个模型很高或很低的可能性，其中包括“否决权”，而不是取其平均，而这一切有时可能对您不利。

— 朱巴卜2015年

因此，如果三个预测都为75％，并且没有有关其可靠性的信息，那么最终预测将为75％？

— Karsten W.

@KarstenW。是的，您为什么期望有所不同？如果您没有先验信息，那么这是您仅有的信息，因此您没有理由认为最终结果会有所不同...

— 蒂姆

1

尚未阅读Tetlock的任何学术论文，但我将从这里开始。例如使汇总概率预测更加极端的两个原因。我会仔细看一下Phil的确切用词，可能是我误解了extremify这个词。

— 安迪W

1

I was close with extremified, but not quite. I should have used extremized, see here. Besides the Baron et al. paper mentioned, I see Ville Satopää has some work on the topic arxiv.org/abs/1506.06405.

— Andy W

6

There are two way to think of the problem. One is to say that the sources observe a noisy version of the latent variable "it will rain / it will not rain".

For instance, we could say that each source draws its estimates from a $Beta(a+b,a)$ distribution if it will rain, and a $Beta(a,a+b)$ distribution if it will not.

In this case, the $a$ parameter drops out and the three forecast, $x$ , $y$ , and $z$ would be combined as

p = \frac{1}{1 + {(\frac{1}{x} - 1)}^{b} {(\frac{1}{y} - 1)}^{b} {(\frac{1}{z} - 1)}^{b}}

$p = \frac{1}{1+\left(\frac{1}{x}-1\right)^b\left(\frac{1}{y}-1\right)^b\left(\frac{1}{z}-1\right)^b}$

$b$ is a parameter controlling how under ( $b>1$ ) or over ( $b<1$ ) confident the sources are. If we assume that the sources estimates are unbiased, then $b = 1$ and the estimate simplifies as

\frac{p}{1 - p} = \frac{x}{1 - x} \frac{y}{1 - y} \frac{z}{1 - z}

$\frac{p}{1-p} = \frac{x}{1-x} \frac{y}{1-y} \frac{z}{1-z}$

Which is just saying: the odds of rain is the product of the odds given by each source. Note that it is not well defined if a source gives an estimate of exactly $1$ and another gives an estimate of exactly $0$ , but under our model, this never happens, the sources are never that confident. Of course we could patch the model to allow for this to happen.

This model works better if you're thinking of three people telling you whether or not it rained yesterday. In practice, we know that there is an irreducible random component in the weather, and so it might be better to assume that nature first picks a probability of rain, which is noisily observed by the sources, and then flips a biased coin to decide whether or not it is going to rain.

In that case, the combined estimate would look much more like an average between the different estimates.

— Arthur B.
source

What would x, y, z be in this model?

— Karsten W.

这将是三个不同的预测。

— 亚瑟·B。

The example you were wondering about would be

x = y = z = \frac{3}{4}

$x = y = z = \frac{3}{4}$ . In the framework I suggested as a reasonable choice, you would have

p = \frac{27}{28}

$p = \frac{27}{28}$ . This is because

\frac{3}{4}

$\frac{3}{4}$ represents 3 to 1 odds, so the product represents 27 to 1 odds, or a

\frac{27}{28}

$\frac{27}{28}$ probability.

— Arthur B.

Going from 3/4 to 27/28 is a bit extreme, it is like three people were telling you that the sky is dark blue and you concluded it is black...

— Tim

It depends on the model. Here I'm assuming each source has a noisy view on a latent binary variable, rain or no rain. It's more like three different people tell you it rained yesterday. You can also model the system as there being a latent probability of rain and the forecast sources as getting a noisy version of that forecast.

— Arthur B.

3

In the framework of Transferable Belief Model (TBM), it is possible to combine different predictions using for instance the "conjunctive rule of combination". In order to apply this rule, you need to transform the probabilities of the predictions into basic belief assignments. This can be achieved with the so-called Least-Committed-Principle. In R:

library(ibelief)
#probabilities
p1 <- c(0.99, 0.01) # bad results for 0 and 1
p2 <- c(0.01, 0.99)
p3 <- c(0.5, 0.5)

# basic belief assignment, 
# each row represents a subset of (rain, not rain)
# each column represents one prediction
Mat <- LCPrincple(rbind(p1,p2,p3))

# combine beliefs
m <- DST(Mat, 1)

# resulting probability distribution (pignistic probability)
mtobetp(m)
# returns 0.5 and 0.5

For the second example of three independent predictions of 0.75, this approach returns a higher value:

p4 <- c(0.75, 0.25)
Mat <- LCPrincple(rbind(p4,p4,p4))
m <- DST(Mat, 1)
mtobetp(m)
#returns 0.9375 0.0625

This is not very far from the Bayesian approach shown in Arthur B's answer.

— Karsten W.
source

2

I think it's worthwhile to look at the weighting scheme based on inverse errors mentioned in one of the answers. If the sources are truly independent and we constrain the weights to sum to one, the weights are given by

w_{1} = \frac{σ_{2}^{2} σ_{3}^{2}}{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} + σ_{2}^{2} σ_{3}^{2}}, w_{2} = \frac{σ_{1}^{2} σ_{3}^{2}}{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} + σ_{2}^{2} σ_{3}^{2}}, w_{3} = \frac{σ_{1}^{2} σ_{2}^{2}}{σ_{1}^{2} σ_{2}^{2} + σ_{1}^{2} σ_{3}^{2} + σ_{2}^{2} σ_{3}^{2}} .

$w_1 = {{\sigma_2^2 \sigma_3^2} \over {\sigma_1^2 \sigma_2^2 + \sigma_1^2 \sigma_3^2 + \sigma_2^2 \sigma_3^2}},\ w_2 = {{\sigma_1^2 \sigma_3^2} \over {\sigma_1^2 \sigma_2^2 + \sigma_1^2 \sigma_3^2 + \sigma_2^2 \sigma_3^2}},\ w_3 ={{\sigma_1^2 \sigma_2^2} \over {\sigma_1^2 \sigma_2^2 + \sigma_1^2 \sigma_3^2 + \sigma_2^2 \sigma_3^2}}.$

If, as the OP states, the forecasts are equally reliable, then all weights will simplify to $\frac{1}{3}$ and the combined forecast for the given example will be 50%.

Note that the values of $\sigma_i$ do not need to be known if their relative proportions are known. So if $\sigma_1^2 : \sigma_2^2 : \sigma_3^2 = 1:2:4,$ then the forecast in the example would be

f = \frac{8}{14} * (0) + \frac{4}{14} * (1) + \frac{2}{14} * (0.5) = 0.3571

$f = { {{8} \over {14}}*(0) + {{4} \over {14}}*(1) + {{2} \over {14}}*(0.5) } = 0.3571$

— soakley
source

1

Their numbers for rain likelihood is only half the story, as we'd have to temper their predictions with the probability that they are accurate when making guesses.

因为像雨这样的东西是互斥的（在这种情况下是下雨还是不下雨），所以它们不能同时按Karsten的建议以75％的概率同时正确（我想，我很难混淆我所听到的含义查找“组合概率”）。

考虑到他们预测天气的个人能力，我们可以采取刺杀措施（托马斯·贝叶斯，就像在黑暗中一般的盲注），以防明天下雨。

电台1的预测正确率是60％，第二个是30％，最后一个是10％。

E [rain] = Px X + Py Y + Pz * Z是我们在这里查看的形式：

（.6）（0）+（。3）（1）+（。1）（。5）= E [rain] = 35％的降雨机会，具有预估的准确度。

— 哈沃克
source

1

该算法可以产生值高于1

— 安迪W¯¯

1

这个问题有很多复杂的答案，但是逆方差加权均值呢：https : //en.wikipedia.org/wiki/Inverse-variance_weighting

如果实验者用n种不同的仪器以不同的测量质量进行n次相同的测量，而不是用一种仪器进行n次重复的测量...

每个随机变量的权重均与其方差成反比。

逆方差加权平均值似乎非常易于计算，并且奖金在所有加权平均值中的方差最小。

— 来福士
source

-1

为了结合可靠性，我的首选公式是r1xr2xr3÷（r1xr2xr3 +（1-r1）x（1-r2）x（1-r3）。因此，对于3个可靠性来源来说，有75％的人都说相同的话， .75 ^ 3÷（.75 ^ 3 + .25 ^ 3）=>组合响应的96％可靠性

— 用户名
source

1

这似乎不是该问题的正确答案。

— Michael R. Chernick

诚然，这更多是对KarstenW评论的回应，而不是对问题的直接回应。

— user3902302