将球扔进垃圾箱,估计其概率的下限


14

尽管看起来像这不是家庭作业。欢迎任何参考。:-)

场景:n 不同的球和不同的容器(从1到,从左到右标记)。每个球都独立均匀地扔进垃圾箱。令为第个分箱中的球数。令表示以下事件。n f i i E in nf(i)iEi

对于每个,Σ ķ Ĵ ˚F ķ Ĵ - 1jikjf(k)j1

也就是说,对于每个,前垃圾箱(最左边的垃圾箱)包含的球数少于个。Ĵ Ĵ Ĵ jjjji

问题:估计i<nPr(Ei),用n?当n变为无穷大时。下限是首选。我认为不存在容易计算的公式。

PrEn=0limnPr(E1)=limn(n1n)n=1ePr(En)=0

我的猜测:我猜,当n变为无穷大时。我考虑了总和中的前\ ln n个项目。n ln ni<nPr(Ei)=lnnnlnn


1
它看起来像是生日问题的一个子案例。–
Gopi

@Gopi我不能说服自己我的问题是生日限制问题。您能明确解释吗?非常感谢你。注意:约束条件是对前垃圾箱中球的总数,而不是特定垃圾箱中垃圾箱的数量。j
张鹏2012年

确实,我不好,在重新阅读了有关生日问题的维基百科文章后,我意识到我正在考虑从生日问题改编的另一个问题。
Gopi 2012年

2
一些不正确的想法...因此,请考虑如何编码状态:从左到右阅读垃圾箱表单。如果第一个箱中有i个球,则输出序列i,后跟一个0。对所有箱从左到右执行此操作。您的想法似乎是您对最大的i感兴趣,从而使该二进制字符串(具有n个零和n个)首次包含多于零的零。现在,让我们做命运的飞跃,并产生0和1的概率相等。(这可能是完全废话)。此问题与加泰罗尼亚语数字和Dyck单词有关。和...???1/2
Sariel Har-Peled 2012年

4
我看不出您的定义为何球不一样重要。同样,字符串插入考虑了bin不同的事实。
Sariel Har-Peled 2012年

Answers:


11

编辑:(2014-08-08)正如道格拉斯·扎尔(Douglas Zare)在评论中指出的那样,下面的论点,特别是两个概率之间的“桥梁”,是不正确的。我看不出直接解决的方法。我会离开这里的答案,因为我相信它仍然提供了一些直觉,但要知道,不是真的一般。

Pr(Em)l=1mPr(Fl)

这不是一个完整的答案,但希望它将有足够的内容供您或比我自己更精通的人完成。

考虑恰好的概率球落入第一(的Ñ仓):kln

(nk)(ln)k(nln)nk

称少于球落入前l个容器F l的概率:llFl

Pr(Fl)=k=0l1(nk)(ln)k(nln)nk

上面的事件发生的可能性小于我们认为F 1事件中的每个事件独立发生并且一次全部发生的可能性。这为我们提供了两者之间的桥梁:ElFl

Pr(Em)l=1mPr(Fl)=l=1m(k=1l1(nk)(lnk)(nln)nk)=l=1mF(l1;n,ln)

其中用于二项分布累积分布函数p=F(l1;n,ln)。只是读了几行向下维基百科的页面上,并注意到-1个pñ,我们可以使用的Chernoff不等式得到:p=ln(l1pn)

Pr(Em)l=1mexp[12l]=exp[12l=1m1l]=exp[12Hm]exp[12(12m+ln(m)+γ)]

其中是第m谐波数γ是Euler-Mascheroni常数,H m的不等式取自Wolfram的MathWorld链接页面。HmmγHm

不担心的因素,这终于给了我们:e1/4m

Pr(Em)eγ/2m

下面是一个平均为100000个实例中的log-log图作为的函数与功能ë - γ / 2n=2048m也作参考:eγ/2m

enter image description here

关闭常量时,函数的形式似乎正确。

以下是变化对数-对数图,每个点是100,000个实例的平均值与m的关系nm

enter image description here

最后,转到您想要回答的原始问题,因为我们知道我们有:Pr(Em)1m

i<nPr(Ei)n

作为数值验证,以下是总和与实例大小n的对数对数图。每个点代表100,000个实例总数的平均值。函数X 1 / 2已被绘制以供参考:Snx1/2

enter image description here

尽管我认为两者之间没有直接联系,但该问题的窍门和最终形式与生日问题有很多共同点,正如评论中最初猜测的那样。


4
你如何获得?例如,对于n = 100,我计算出P r E 2= 0.267946 > 0.14761 = P r F 1P r F 2Pr(E2)Pr(F1)×Pr(F2)n=100Pr(E2)=0.267946>0.14761=Pr(F1)Pr(F2).如果您被告知第一个垃圾箱是空的,这是否会使前两个垃圾箱最多容纳球?这很有可能,因此P r F 1P r F 2被低估了。1Pr(F1)Pr(F2)
道格拉斯·扎里

@DouglasZare,我已经验证了您的计算,您是正确的。为我提供不那么严格的服务。
user834 2014年

15

答案是Θ(n)

First, let's compute En1.

Let's suppose we throw n balls into n bins, and look at the probability that a bin has exactly k balls in it. This probability comes from the Poisson distribution, and as n goes to the probability that there are exactly k balls in a given bin is 1e1k!.

nnnkjjj=1n1kj!

t to t+1k with probability 1e1k!. I claim that if you condition on the event that this random walk returns to 0 after n steps, the probability that this random always stays above 0 is the probability that the OP wants to calculate. Why? This height of this random walk after s steps is s minus the number of balls in the first s bins.

If we had chosen a random walk with a probability of 12 of going up or down 1 on each step, this would be the classical ballot problem, for which the answer is 12(n1). This is a variant of the ballot problem which has been studied (see this paper), and the answer is still Θ(1n). I don't know whether there is an easy way to compute the constant for the Θ(1n) for this case.

The same paper shows that when the random walk is conditioned to end at height k, the probability of always staying positive is Θ(k/n) as long as k=O(n). This fact will let us estimate Es for any s.

I'm going to be a little handwavy for the rest of my answer, but standard probability techniques can be used to make this rigorous.

We know that as n goes to , this random walk converges to a Brownian bridge, i.e., Brownian motion conditioned to start and end at 0. From general probability theorems, for ϵn<s<(1ϵ)n, the random walk is roughly Θ(n) away from the x-axis. In the case it has height t>0, the probability that it has stayed above 0 for the entire time before s is Θ(t/s). Since t is likely to be Θ(n) when s=Θ(n), we have EsΘ(1/n).


4

[Edit 2014-08-13: Thanks to a comment by Peter Shor, I have changed my estimate of the asymptotic growth rate of this series.]

My belief is that limni<nPr(Ei) grows as n. I do not have a proof but I think I have a convincing argument.

Let Bi=f(i) be a random variable that gives the number of balls in bin i. Let Bi,j=k=ijBk be a random variable that gives the total number of balls in bins i through j inclusive.

You can now write Pr(Ei)=b<jPr(EjB1,j=b)Pr(EiEjB1,j=b) for any j<i. To that end, let's introduce the functions π and gi.

π(j,k,b)=Pr(Bj=kB1,j1=b)=(nbk)(1nj+1)k(njnj+1)nbk

gi(j,k,b)=Pr(EiBj,ikEj1B1,j1=b)={0k<01k>=0j>il=0jb1π(j,l,b)gi(j+1,kl,b+l)otherwise

We can write Pr(Ei) in terms of gi:

Pr(Ei)=gi(1,i1,0)

Now, it's clear from the definition of gi that

Pr(Ei)=(ni)ni+1nnhi(n)

where hi(n) is a polynomial in n of degree i1. This makes some intuitive sense too; at least ni+1 balls will have to be put in one of the (i+1)th through nth bins (of which there are ni).

Since we're only talking about Pr(Ei) when n, only the lead coefficient of hi(n) is relevant; let's call this coefficient ai. Then

limnPr(Ei)=aiei

How do we compute ai? Well, this is where I'll do a little handwaving. If you work out the first few Ei, you'll see that a pattern emerges in the computation of this coefficient. You can write it as

ai=μi(1,i1,0)
where
μi(j,k,b)={0k<01k>=0i>jl=0jb11l!μi(j+1,kl,b+l)otherwise

Now, I wasn't able to derive a closed-form equivalent directly, but I computed the first 20 values of Pr(Ei):

N       a_i/e^i
1       0.367879
2       0.270671
3       0.224042
4       0.195367
5       0.175467
6       0.160623
7       0.149003
8       0.139587
9       0.131756
10      0.12511
11      0.119378
12      0.114368
13      0.10994
14      0.105989
15      0.102436
16      0.0992175
17      0.0962846
18      0.0935973
19      0.0911231
20      0.0888353

Now, it turns out that

Pr(Ei)=iii!ei=Pois(i;i)

where Pois(i;λ) is the probability that a random variable X has value i when it's drawn from a Poisson distribution with mean λ. Thus we can write our sum as

limni=1nPr(Ei)=x=1xxx!ex

Wolfram Alpha tells me this series diverges. Peter Shor points out in a comment that Stirling's approximation allows us to estimate Pr(Ei):

limnPr(Ex)=xxx!ex12πx

Let

ϕ(x)=12πx

Since

  • limxϕ(x)ϕ(x+1)=1
  • ϕ(x) is decreasing
  • 1nϕ(x)dx as n

our series grows as 1nϕ(x)dx (See e.g. Theorem 2). That is,

i=1nPr(Ei)=Θ(n)

1
Wolfram Alpha is wrong. Use Stirling's formula. It says that, xx/(x!ex)1/2πx.
Peter Shor

@PeterShor Thanks! I've updated the conclusion thanks to your insight, and now I am in agreement with the other two answers. It's interesting to me to see 3 quite different approaches to this problem.
ruds

4

Exhaustively checking the first few terms (by examining all n^n cases) and a bit of lookup shows that the answer is https://oeis.org/A036276 / nn. This implies that the answer is n12π2.

More exactly, the answer is:

n!2nnk=0n2nkk!
and there is no closed-form answer.

Oeis is pretty awesome
Thomas Ahle
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.