这个人是女性的几率是多少?


32

窗帘后面有一个人-我不知道这个人是女性还是男性。

我知道这个人长发,而且所有长发中有90%是女性

我知道该人患有罕见的AX3血型,并且所有这种血型的人中有80%是女性。

这个人是女性的几率是多少?

注意:最初的配方在两个假设的基础上进行了扩展:1.血型和头发长度是独立的。2.总体人口中男性与女性的比例为50:50

(这里的具体情况不是那么重要-而是,我有一个紧急项目,要求我有正确的方法来回答这个问题。我的直觉是这是一个简单的概率问题,一个简单的确定性答案,而不是而不是根据不同的统计理论有多个值得商answers答案的事物。)


1
没有多种概率论,但是众所周知,人们很难正确地思考概率。(出色的数学家奥古斯都·德摩根(Augustus DeMorgan)由于其困难而放弃了对概率的研究。)不要讨论辩论:寻找对概率原理的吸引力(例如Kolmogorov公理)。不要让这个问题民主化地解决:您的问题吸引了许多错误的答案,即使其中一些人碰巧同意,也只是集体错误。@Michael C提供了很好的指导;我的回复试图告诉你他为什么是对的。
ub

@Whuber,如果假设具有独立性,您是否同意0.97297是正确的答案?(我相信,如果没有这个假设,答案可能在0%到100%之间的任何地方-您的图表很好地显示了这一点)。
可能

独立到底是什么?您是否建议女性和男性发型相同?正如您在问题中所说,涉及性别/头发/血液类型的特定场景可能不相关:这告诉我您试图了解一般如何解决此类问题。为此,您将需要知道哪些假设意味着哪些结论。因此,您需要非常仔细地关注您愿意做出的假设,并准确确定它们可以为您做出的结论。
ub

3
探索的独立性涉及所有三个特征的组合。例如,如果AX3是包括女性(而非男性)脱发在内的综合症的标志物,则任何患有AX3的长发人士都必然是男性,从而使女性成为0%的概率,而不是97.3%。我希望这可以使所有对此问题做出明确答案的人都必须做出其他假设,即使他们没有明确承认它们也必须这样做。真正有用的答案,恕我直言,将直接表明不同的假设如何导致不同的结果。
ub

2
您错过了女性没有长发的可能性。这是一项关键措施。
Daniel R Hicks

Answers:


35

许多人发现根据“人口”,其中的子组和比例(而不是概率)进行思考很有帮助。 这有助于视觉推理。

我将详细解释这些数字,但目的是对这两个数字进行快速比较,应立即并令人信服地指出如何以及为什么无法对该问题给出具体答案。稍长的检查将提示哪些附加信息对确定答案或至少获得答案的界线有用。

维恩图

传说

阴影线:女性/ 底色:男性。

:长毛/ :短毛。

右(有色):AX3 / 左(无色):非AX3。

数据

顶部阴影线是顶部矩形的90%(“长发的所有人中90%是女性”)。

右侧彩色矩形中的总阴影线是该矩形的80%(“所有这种血型的人中有80%是女性。”)

说明

该图示意性地示出了如何将(所考虑的所有女性和非女性中的)人口同时划分为女性/非女性,AX3 /非AX3和长发/非长发(“短”)。它使用面积(至少近似地)来表示比例(有些夸张以使图片更清晰)。

显然,这三个二进制分类创建了八个可能的组。每个组出现在这里。

给出的信息表明,上面的阴影线矩形(长发女性)占上面矩形的90%(所有长发人)。它还指出,彩色矩形的组合阴影线部分(带有AX3的长发女性和带有AX3的短发女性)占右侧有色区域的80%(所有带有AX3的人)。 我们被告知有人躺在右上角(箭头):AX3长发的人。斜线阴影(女性)占矩形的比例是多少?

我还(隐式地)假设血型和头发长度是独立的:上色(AX3)的上矩形(长发)的比例等于上色(AX3)的下矩形(短发)的比例。这就是独立的意义。在解决此类问题时,这是一个公平自然的假设,但是当然需要说明。

上面的阴影线矩形(长发雌虫)的位置未知。 我们可以想象左右滑动顶部阴影线的矩形,并左右滑动底部阴影线的矩形,并可能改变其宽度。 如果我们这样做是为了使80%的彩色矩形保持交叉阴影线,则这种更改将不会更改任何声明的信息,但是会更改右上矩形中的女性比例。显然,该比例可能在0%到100%之间,并且仍然与给出的信息一致,如下图所示:

图2


这种方法的优点之一是可以建立问题的多个答案。一个人可以用代数的方式翻译所有这些内容,并通过规定概率来提供特定的情况作为可能的例子,但是随后出现的问题是,这些例子是否真的与数据一致。例如,如果有人建议也许50%的长发人是AX3,那么从所有可用信息开始,甚至还不可能做到这一点。这些(Venn)总体图及其子组图清楚地表明了这一点。


3
Whuber,假设血型和头发长度是独立的,那么确定AX3型的长发女性的比例应该与AX3的短发女性的比例相同吗?也就是说,您没有按照建议的方式灵活移动矩形...如果我们还假设男性和女性在整个人口中的比例为50:50,那么这是否就给我们足够的信息来解决一个问题?无可争辩的答案?
可能

@whuber +1非常好。
Michael R. Chernick 2012年

5
ProbablyWrong,需要在您的评论的问题的密切关注:因为它涉及的女性,它正在对独立的附加假设条件上的性别。头发和血型的(无条件)独立性假设根本没有提及性别,因此要了解其含义,请从图中消除交叉影线。 我希望这表明了为什么我们可以灵活地将交叉阴影线放置在上下矩形内的任意位置。
ub

1
@whuber,我喜欢这个。但是,我有2个问题/需要澄清的问题:1.数字似乎假设长发和短发(约6:4)和〜AX3 vs AX3(约85:15)的人口比例,但这在原始问题中未提及也不会在您对数字的解释中进行讨论。我怀疑流行比例不相关。我是对的/您能在说明中澄清吗?2.我认为这种情况最终会与辛普森悖论产生相同的现象,只是构架不同(从另一个方向来考虑这个问题)。那是公平的评估吗?
gung-恢复莫妮卡

3
@gung,谢谢您的澄清。这些数字当然必须代表一定比例才能完全起作用,但是在问题陈述中没有具体规定的任何比例都可以自由更改。(我确实构建了这个图,以便大约50%的人口是女性,并期望随后进行编辑。)使用这种图形表示法来理解辛普森悖论的想法很有趣。我认为这是值得的。
ub

13

这是条件概率的问题。您知道该人的头发和血液类型均为Ax3。令A = { “人长发” } 因此,您寻求 P C | A B 。您知道 P C | A = 0.9 P C | B = 0.8。 这足以计算 P C | A B 吗?假设 P A B C = 0.7

     A={'The person has long hair'}              B={'The person has blood type Ax3'}C={'The person is female'}.

P(C|A and B)P(C|A)=0.9P(C|B)=0.8
P(C|A and B)P(A and B and C)=0.7。那么 假设P A B = 0.8。然后,通过上述,P C | A B = 0.875
P(C|A and B)=P(A and B and C)/P(A and B)=0.7/P(A and B).
P(A and B)=0.8P(C|A and B)=0.875。另一方面,如果,则P C | A B = 0.78。P(A and B)=0.9P(C|A and B)

P C | B = 0.8时,两者都是可能的。因此,我们无法确定P C | A B 是什么。P(C|A)=0.9P(C|B)=0.8P(C|A and B)


迈克尔,您好:如果我没看错您的意思,是您无法回答所提出的问题,对吗?或者换一种说法,您需要更多信息来回答这个问题?1.假设我最初提出的问题中的稀有血型不会对一个人长发的欲望或能力产生任何影响。现在可以回答问题了吗?2.您是否同意答案必须大于0.9?(因为您有第二条独立信息-血型-强化了该人是女性的假设)
可能

2
如果是独立的,则P A  和  B = P A P B ,则需要指定哪些人有长发,即P A 和什么比例的人的血液类型为Ax3,即P B 。另外,您不能说答案必须大于0.9,这等于说P C | A  和  B P(A and B)P(A and B)=P(A)P(B)P(A)P(B)(我真的不明白为什么)。P(C|A and B)>0.9
内斯托尔·

2
@可能是错误的。是的,最初提到的问题没有足够的信息来提供唯一答案。
Michael R. Chernick

@Néstor,Micahael,我不同意我们需要知道哪些人有长发,或者哪些人有AX3血型。我认为原始问题的答案会在不知道这些问题的情况下唯一地解决(假设A和B是独立的,我们都拥有,并且假设我们知道整个人口中的男女比例-不合理地假设那是50:50) , 我认为)。
可能

7
为什么我认为,P C ^ |= P Ç
P(C|A and B)=P(A and B and C)×P(A and B)??
使用条件概率的定义。
P(C|AB)=P(C(AB))P(AB)=P(ABC)P(AB)
Dilip Sarwate 2012年

4

有趣的讨论!我想知道我们是否还指定了P(​​A)和P(B),是否仅仅因为有很多限制,P(C | A,B)的范围是否不会比整个间隔[0,1]窄很多?我们有。

坚持上面介绍的符号:

A =那个人长发的事件

B =人的血液类型为AX3

C =人是女性的事件

P(C | A)= 0.9

P(C | B)= 0.8

P(C)= 0.5(即,假设总体人口中男女比例相等)

在给定C的情况下,似乎不可能假设事件A和B是条件独立的!直接通向一个矛盾:如果P(AB|C)=P(A|C)P(B|C)=P(C|A)P(A)P(C)P(C|B)P(B)P(C)

然后

P(C|AB)=P(AB|C)(P(C)P(AB))=P(C|A)P(A)P(C)P(C|B)P(B)P(C)(P(C)P(AB))

如果我们现在假定A和B是独立,以及:的大多数术语取消和我们结了P(AB)=P(A)P(B)

P(C|AB)=P(C|A)P(C|B)P(C)=0.90.80.5>1

对这个问题的whuber的精彩几何表示后续:尽管这是事实,一般来说可以在区间承担任何值[ 0 1 ]的几何约束做显著缩小的可能值的范围为的值P P 不属于“太小”。(尽管我们也可以将边际上界:P A P B P(C|AB)[0,1]P(A)P(B)P(A)P(B)

让我们计算{\ BF最小可能值}对在下列几何约束:P(C|AB)

1.上方矩形所覆盖的上方区域的分数(A TRUE)必须等于P(C|A)=0.9

2.两个矩形的面积之和必须等于P(C)=0.5

3.两个彩色矩形的面积之和(即它们与事件B的重叠)之和必须等于P(C|B)=0.8

4.(平凡的)上方的矩形不能移出左侧边界,也不能移至左侧的最小交叠处。

5.(平凡的)下部矩形不能移到右边界之外,也不能移到右部最大重叠之外。

这些约束条件限制如何自由地我们可以滑动散列矩形和反过来产生用于下界。下图(使用此R脚本创建)显示了两个示例 PC|一种enter image description here

在P(A)和P(B)的一系列可能值中运行(R脚本)会生成此图 enter image description here

总之,对于给定的P(A),P(B),我们可以下限条件概率P(c | A,B)


2
马库斯(Markus),第一段是一个单独的问题,而不是一个答案。随后的材料看起来像是一个很好的观察,但是如果不告诉别人什么是很难的一种C代表。请记住,不同的用户将根据他们的喜好以及上次编辑答案的时间,以不同的顺序看到答案,因此每个答案都必须独立于其他答案而可读(尽管您当然可以链接到其他答案)。
ub

1
@whuber:感谢您的有用评论!我希望新的编辑使其更具可读性和清晰度。
Markus Loecher

@whuber和其他人:我曾希望重新讨论,但该话题似乎变得不活跃了?没有人发表评论了吗?
Markus Loecher

1

假设是窗帘后面的人是女人。

我们给出了2条证据,即:

证据1:我们知道这个人长发(据告知,所有长发人中有90%是女性)

证据2:我们知道该人患有罕见的AX3血型(我们被告知这种血型的所有人中有80%是女性)

仅凭证据1,我们可以说窗帘后面的人有一个成为女人的概率值为0.9(假设男人和女人之间的比例为50:50)。

关于线程前面提出的问题,即“您是否同意答案必须大于0.9?”,我不做任何数学运算,直觉地说,答案必须是“是”(大于0.9)。逻辑是证据2支持证据(同样,假设世界上男女人数以50:50的比例分配)。如果我们被告知所有AX3型血液患者中有50%是女性,那么证据2将是中性的,没有任何影响。但是由于我们被告知所有这种血型的人中有80%是女性,因此证据2支持了证据,因此从逻辑上讲应该将女性的最终概率提高到0.9以上。

要计算特定概率,我们可以对证据1应用贝叶斯规则,然后使用贝叶斯更新将证据2应用于新假设。

假设:

A =那个人长发的事件

B =人的血液类型为AX3

C =人是女性的事件(假设50%)

将贝叶斯规则应用于证据1:

P(C | A)=(P(A | C)* P(C))/ P(A)

在这种情况下,再次假设男人和女人之间的比例为50:50:

P(A)=(0.5 * 0.9)+(0.5 * 0.1)= 0.5

因此,P(C | A)=(0.9 * 0.5)/ 0.5 = 0.9(不足为奇,但是如果我们不按50:50的比例分配性别,则情况会有所不同)

使用贝叶斯更新应用证据2并插入0.9作为新的先验概率,我们具有:

P(C | A AND B)=(P(B | C)* 0.9)/ P(E)

在此假设给定某人已经有90%的机会成为女性的假设,P(E)是证据2的概率。

P(E)=(0.9 * 0.8)+(0.1 * 0.2)[这是总概率定律:(P(女性)* P(AX3 |女性)+ P(男性)* P(AX3 |男性)] ,P(E)= 0.74

因此,P(C | A AND B)=(0.8 * 0.9)/ 0.74 = 0.97297


1
您的回答中有几句话对我没有意义。(1)假设P(C | A)= 0.9。没有人说P(C)= 0.9。我们假设P(C)= 0.5。(2)如何获得P(E)的结果?P(女人)= P(男人)= 0.5,前提是写P(女人)= 0.9。
Michael R. Chernick

假设P(C)的值为0.5,这就是我使用的值。P(E)的值是应用证据1后的证据2的概率(这导致一个新的假设,即该人是女性的概率为0.9)。P(E)=(该人是女人的概率(给定为Evience 1)*该人如果是女性,则具有AX3的概率)+(该人是男人的概率(给定为Evience 1))*该人具有AX3的概率如果是男人)=(0.9 * 0.8)+(0.1 * 0.2)= 0.74
RandomAnswer 2012年

您对E概率的定义有点令人困惑,并且您用来计算E的术语看起来与您之前撰写的内容有所不同。其实没关系。根据Huu精心介绍的答案,答案显然是正确的。
Michael R. Chernick 2012年

@Michael除非Huu犯了错误。
ub

2
This answer is simply wrong. There may be other errors, but this one is glaring. You state a definitive answer for P("Has Long Hair") (your P(A)), and then use that to give your final definitive answer. There simply isn't enough information to determine this, even assuming P(F) = 0.5. Your line to calculate P(A) seems to come from nowhere. Here is the correct formula using Bayes theroem: P(A) = P(A|F)P(F)/P(F|A) from which, using your stated assumptions, get to P(A) = P(A|F)*5/9. However we still don't know P(A|F), which could be anything.
Bogdanovist

0

Question Restatement and Generalisation

A, B, and C are binary unknowns whose possible values are 0 and 1. Let Zi stand for the proposition, "The value of Z is i". Also let (X|Y) stand for "The probability that X, given that Y". What is (Aa|BbCcI), given that

  1. (Aa1|Bb1I)=u1 and (Aa2|Cc2I)=u2
  2. (Aa1|Bb1I)=u1 and (Aa2|Cc2I)=u2 and (BC|I)=(B|I)(C|I)
  3. (Aa1|Bb1I)=u1 and (Aa2|Cc2I)=u2 and (A0|I)=12
  4. (Aa1|Bb1I)=u1 and (Aa2|Cc2I)=u2 and (A0|I)=12 and (BC|I)=(B|I)(C|I)

and that I contains no relevant information besides what is implicit in the assignments? The last conjunct of conditions 2 and 4 is shorthand for the independence statement

(BjCk|I)=(Bj|I)(Ck|I),j=0,1k=0,1
Treat each of the four cases in turn.

Answers

Case 1

We have to specify the distribution (ABC|I). The problem is underdetermined, because (ABC|I) requires eight numbers, but we have only three equations---the two given conditions and the normalisation condition.

It has been shown by various esoteric means that the distribution to assign when the information doesn't otherwise determine a solution is the one that, of all distributions consistent with the known information, has the greatest entropy. Any other distribution implies that we know more than the known information, which of course is a contradiction.

All we need to do, therefore, is assign the maximum entropy distribution. This is more easily said than done, and I have not found a general closed-form solution. But particular solutions can be found using a numerical optimiser. We maximise

i,j,k(AiBjCk|I)ln(AiBjCk|I)
subject to the constraints
i,j,k(AiBjCk|I)=1
and
(Aa1|Bb1I)=u1i.e.k(Aa1Bb1Ck|I)i,k(AiBb1Ck|I)=u1
and
(Aa2|Cc2I)=u2i.e.j(Aa2BjCc2|I)i,j(AiBjCc2|I)=u2
Now let's apply this to the question. If we have

  1. "The person is female" A1
  2. "The person has long hair" B1
  3. "The person has blood type AX3" C1

then a=1, b=1, c=1, a1=1, b1=1, a2=1, c2=1, u1=0.9, u2=0.8, and we find that for the maximum entropy solution, (A1|B1C1I)0.932. Therefore the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.932.

Case 2

Now we repeat the exercise with the extra constraint that for a given person, knowing the value of B (the hair state) does not affect our estimate of the value of C (the blood type state), and vice versa. Everything is the same as in Case 1, except there are two extra constraints in the optimisation, namely:

(B0|ClI)=(B0|I),l=0,1
i.e.
i(AiB0Cl|I)i,j(AiBjCl|I)=i,k(AiB0Ck|I),l=0,1
This gives (A1|B1C1I)0.936, so the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.936.

Case 3

Now we remove the independence condition and replace it with the prior condition that there is an equal chance that a given person is male or female:

(A0|I)=12i.e.j,k(A0BjCk|I)=12
This time (A1|B1C1I)0.973, so the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.973.

Case 4

Finally we reintroduce the independence constraints of Case 2, and find that (A1|B1C1I)0.989. Therefore the probability that the person behind the curtain is female, given that he/she has long hair and blood type AX3, is 0.989.


-2

I believe now that, if we assume a ratio of men and women in the population at large, then there is a single indisputable answer.

A = the event that the person has long hair

B = the event that the person has blood type AX3

C = the event that person is female

P(C|A) = 0.9

P(C|B) = 0.8

P(C) = 0.5 (i.e. let's assume an equal ratio of men and women in the population at large)

Then P(C|A and B) = [P(C|A) x P(C|B) / P(C)] / [[P(C|A) x P(C|B) / P(C)] + [[1-P(C|A)] x [1-P(C|B)] / [1-P(C)]]]

in this case, P(C|A and B) = 0.972973


P[C|A and B)= P(A and B and C)/P(A and B)=P(A and B and C)/ [P(A|B) P(B)]. How did you get your formula?
Michael R. Chernick

There is probably a way to add conditions so that you get a unique answer.
Michael R. Chernick

To add by independence of A and B the formula simplifies to P(A and B and C}/[P(A) P(B)]=P(B and C|A)/P(B).
Michael R. Chernick

2
The intent of my question was really for you to justify the formula. I don't understand how it would be derived.
Michael R. Chernick

2
No, the answer that supposedly used Bayes Rule is incorrect. I'm not sure why you are confused, MC's formula above is correct and cannot be used to get any result, that's what his and Whuber's answers to the question explained!
Bogdanovist

-2

Note: In order to get a definitive answer, the below answers assume that the probability of a person, a long-haired man, and a long-haired women having AX3 are approximately the same. If more accuracy is desired, this should be verified.

You start out with the knowledge that the person has long hair, so at this point the odds are:

90:10

Note: The ratio of males to females in the general population does not matter to us once we find out the person has long hair. For example, if there were 1 female in a hundred in the general population, a randomly-selected long-haired person would still be a female 90% of the time. The ratio of females to males DOES matter! (see the update below for details)

Next, we learn that the person has AX3. Because AX3 is unrelated to long hair, the ratio of men to women is known to be 50:50, and because of our assumption of the probabilities being the same, we can simply multiply each side of the probability and normalize so that the sum of the sides of the probability equals 100:

(90:10) * (80:20)
==> 7200:200

    Normalize by dividing each side by (7200+200)/100 = 74

==> 7200/74:200/74
==> 97.297.. : 2.702..

Thus, the chance that the person behind the curtain is female is approximately 97.297%.

UPDATE

Here's a further exploration of the problem:

Definitions:

f - number of females
m - number of males
fl - number of females with long hair
ml - number of males with long hair
fx - number of females with AX3
mx - number of males with AX3
flx - number of females with long hair and AX3
mlx - number of males with long hair and AX3
pfl - probability that a female has long hair
pml - probability that a male has long hair
pfx - probability that a female has AX3
pmx - probability that a male has AX3

First, we are given that 90% of long-haired people are females, and 80% of people with AX3 are female, so:

fl = 9 * ml
pfl = fl / f
pml = ml / m 
    = fl / (9 * m)

fx = 4 * mx
pfx = fx / f
pmx = mx / m 
    = fx / (4 * m)

Because we assumed that the probability of AX3 is independent of gender and long hair, our calculated pfx will apply to women with long hair, and pmx will apply to men with long-hair to find the number of them that likely have AX3:

flx = fl * pfx 
    = fl * (fx / f) 
    = (fl * fx) / f
mlx = ml * pmx 
    = (fl / 9) * (fx / (4 * m)) 
    = (fl * fx) / (36 * m)

Thus, the likely ratio of the number of females with long-hair and AX3 to the number of males with long-hair and AX3 is:

flx             :   mlx
(fl * fx) / f   :   (fl * fx) / (36 * m)
1/f             :   1 / (36m)
36m             :   f

Because it is given that there is an equal number of 50:50, you can cancel both sides and end with 36 females to every male. Otherwise, there are 36*m/f females for every male in the specified subgroup. For example, if there were twice as many women as men, there would be 72 females to each male of those that have long-hair and AX3.


1
This solution relies on assuming more than is currently stated in the problem: namely, that long hair, AX3, and gender are independent. Otherwise, you cannot justify "applying" pfx to women with long hair, etc.
whuber

@whuber: Yes, I do make that assumption. However, isn't the purpose of probability to give the best approximation based on the data that you have? Thus, since you already know that long-hair and AX3 are independent for the general population, you SHOULD carry forward that assumption to males and females until you explicitly learn otherwise. Granted, it is not a universally correct one, but it is the best one you can make until you get more info. Q: With only the current data, if you had to give the % chance that it was a woman behind the curtain, would you really say "between 0 and 100%"?
Briguy37

1
We have an important difference in philosophy, @Briguy. I strongly believe in not making unfounded assumptions. It is not clear in what sense the mutual independence assumption is "best": I will grant it may be in certain applications. But in general, that seems dangerous to me. I would prefer being clear about the assumptions needed to solve a problem, so people can decide whether it is worthwhile collecting the data to check those assumptions, rather than assuming things that are mathematically convenient for the sake of obtaining an answer. That's the difference between stats and math.
whuber

To answer your question: yes, 0% - 100% is exactly the answer I would give. (I have given similar answers to comparable questions on this site.) That range accurately reflects the uncertainty. This issue is closely related to the Ellsberg paradox. Ellsberg's original paper is well written and clear: I recommend it.
whuber

@whuber: Thanks for taking the time to dialogue with me. I see your point about the importance of thinking through and listing the assumptions made, and have updated my answer accordingly. However, in regards to your answer, I believe it is incomplete. The reason for this is that you can consider all unknown cases and find the average probability of across all of them to arrive at your final answer. E.G. Though both are still possible, probabilities above 50% are much more prevalent than probabilities below 50% across all cases, so we are surely better off guessing that it is a woman.
Briguy37

-4

98% Female, simple interpolation. First premise 90% female, leaves 10%, second premise only leaves 2% of the existing 10%, hence 98% female

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.