离散均匀分布中未替换的样本之间的最大间隙


16

这个问题与我实验室对机器人覆盖率的研究有关:

随机绘制Ñn从组数字{ 1 2 ... }{1,2,,m}无需更换,并以升序排序的数字。 。1 Ñ 1nm

从此排序的数字,生成连续数字和边界之间的差:。这给出了间隙。{ 1 一个2 ... 一个Ñ } {a(1),a(2),,a(n)}= { 1 一个2 - 1 ... 一个Ñ - ñ - 1 m + 1 - a n } g={a(1),a(2)a(1),,a(n)a(n1),m+1a(n)}n +1个n+1

最大差距的分布是什么?

P max g k P k m n β。P(max(g)=k)=P(k;m,n)=?

可以使用订单统计信息来构架: P g n + 1 = k = P k ; m n = P(g(n+1)=k)=P(k;m,n)=?

有关间隙分布请参见链接,但是此问题要求最大间隙的分布。

我会对平均值\ mathbb {E} [g _ {(n + 1)}]感到满意E [ g n + 1 ]E[g(n+1)]

如果n = n=m所有间隙的大小均为1。如果n + 1 = mn+1=m则存在一个大小为2的间隙22,并且n + 1n+1可能的位置。最大间隙大小为m n + 1mn+1,并且该间隙可以放置在ñn 数字中的任意一个之前或之后,总共n + 1n+1可能的位置。最小的最大间隙大小为- ñÑ + 1mnn+1。定义任何给定组合T = {m \ choose n} ^ {-1}的概率T = n1T=(mn)1

我已经部分解决了概率质量函数,如 P Ñ + 1 = ķ = P ķ ; Ñ = { 0 ķ < - ñÑ + 11ķ=-ñn+11k=1 (occurs when m=n)T(n+1)k=2 (occurs when m=n+1)T(n+1)k=m(n1)n?m(n1)nkmn+1T(n+1)k=mn+10k>mn+1P(g(n+1)=k)=P(k;m,n)=011T(n+1)T(n+1)?T(n+1)0k<mnn+1k=mnn+1k=1 (occurs when m=n)k=2 (occurs when m=n+1)k=m(n1)nm(n1)nkmn+1k=mn+1k>mn+1(1)

当前工作(1): 第一个间隙的方程很简单: 期望值具有一个简单值: 。通过对称性,我希望所有间隙都具有这种分布。也许可以通过从该分布中绘制次来找到解决方案。a(1)a(1)P(a(1)=k)=P(k;m,n)=1(mn)mn+1k=1(mk1n1)

P(a(1)=k)=P(k;m,n)=1(mn)k=1mn+1(mk1n1)
E[P(a(1))]=1(mn)mn+1k=1(mk1n1)k=mn1+nE[P(a(1))]=1(mn)mn+1k=1(mk1n1)k=mn1+nnnnn

当前的工作(2):容易进行蒙特卡洛模拟。

simMaxGap[m_, n_] := Max[Differences[Sort[Join[RandomSample[Range[m], n], {0, m+1}]]]];
m = 1000; n = 1; trials = 100000;
SmoothHistogram[Table[simMaxGap[m, n], {trials}], Filling -> Axis,
Frame -> {True, True, False, False},
FrameLabel -> {"k (Max gap)", "Probability"},
PlotLabel -> StringForm["m=``,n=``,smooth histogram of maximum map for `` trials", m, n, trials]][![enter image description here][1]][1]

1
在这些条件下,您必须具有n <= m。我认为您希望g = {a_(1),a_(2)-a_(1),...,a_(n)-a_(n-1)}。随机选择是否意味着在第一次抽奖中以1 / m的概率选择每个数字?由于您不进行替换,因此概率在第二秒为1 /(m-1),依此类推,如果n = m,则在第m次抽奖中降至1。如果n <m,这将在第n次抽奖中以概率1 /(m-(n-1))的最后一次抽奖更早地停止。
Michael R. Chernick

2
您对原始描述没有任何意义,因为(我相信)您已对两个下标进行了转置。请验证我的编辑是否符合您的意图:特别是,请确认您的意思是存在间隙,其中是第一个。ggnna(1)a(1)
ub

1
@gung我认为这是研究而非自学
Glen_b-恢复莫妮卡(Monica)

1
我认为您的最小和最大间隙大小应为和。最小间隙大小是选择连续整数时的最大间隙大小,并且当您选择和第一个整数(或和)时会出现最大间隙大小11mn+1mn+1mmn1n11,,n11,,n111mn+2,,mmn+2,,m
概率

1
谢谢迈克尔·切尔尼克(Michael Chernick)和概率逻辑,我们已对您进行了更正。感谢@whuber进行更正!
AaronBecker

Answers:


9

f g ; n m 为最小值a 1 等于g的机会;也就是说,样本由gn 1子集{ g + 1 g + 2 m }组成。有 m gf(g;n,m)a(1)ggn1{g+1,g+2,,m}ñ - 1个这种子集出的(mgn1)n同样可能的子集(mn)

Pr a 1 = g = f g ; n m = m - gn 1n

Pr(a(1)=g=f(g;n,m)=(mgn1)(mn).

对于所有大于gk的可能值,将f k ; n m 相加得出生存函数f(k;n,m)kg

Pr a 1 > g = Q g ; n m = m g m g 1n 1Ñ n

Pr(a(1)>g)=Q(g;n,m)=(mg)(mg1n1)n(mn).

G n m为最大差距给定的随机变量:Gn,m

Gn,m=max(a(1),a(2)a(1),,a(n)a(n1)).

Gn,m=max(a(1),a(2)a(1),,a(n)a(n1)).

(此响应的问题如最初框架,它被修改为包括之间的间隙前一个Ñ )。a(n)m 我们将计算其生存函数P ; Ñ = ģ Ñ > 从中可以轻松得出G n m的整个分布。该方法是一个从n = 1开始的动态程序,对于

P(g;n,m)=Pr(Gn,m>g),
Gn,mn=1

P(g;1,m)=Pr(G1,m>1)=mgm, g=0,1,,m.

P(g;1,m)=Pr(G1,m>1)=mgm, g=0,1,,m.(1)

For larger n>1n>1, note that the event Gn,m>gGn,m>g is the disjoint union of the event

a1>g,

a1>g,

for which the very first gap exceeds gg, and the gg separate events

a1=k and Gn1,mk>g, k=1,2,,g

a1=k and Gn1,mk>g, k=1,2,,g

for which the first gap equals kk and a gap greater than gg occurs later in the sample. The Law of Total Probability asserts the probabilities of these events add, whence

P(g;n,m)=Q(g;n,m)+gk=1f(k;n,m)P(g;n1,mk).

P(g;n,m)=Q(g;n,m)+k=1gf(k;n,m)P(g;n1,mk).(2)

Fixing gg and laying out a two-way array indexed by i=1,2,,ni=1,2,,n and j=1,2,,mj=1,2,,m, we may compute P(g;n,m)P(g;n,m) by using (1)(1) to fill in its first row and (2)(2) to fill in each successive row using O(gm)O(gm) operations per row. Consequently the table can be completed in O(gmn)O(gmn) operations and all tables for g=1 through g=mn+1 can be constructed in O(m3n) operations.

Figure

These graphs show the survival function gP(g;n,64) for n=1,2,4,8,16,32,64. As n increases, the graph moves to the left, corresponding to the decreasing chances of large gaps.

Closed formulas for P(g;n,m) can be obtained in many special cases, especially for large n, but I have not been able to obtain a closed formula that applies to all g,n,m. Good approximations are readily available by replacing this problem with the analogous problem for continuous uniform variables.

Finally, the expectation of Gn,m is obtained by summing its survival function starting at g=0:

E(Gn,m)=mn+1g=0P(g;n,m).

Figure 2: contour plot of expectation

This contour plot of the expectation shows contours at 2,4,6,,32, graduating from dark to light.


Suggestion: line "Let Gn,m be the random variable given by the largest gap:", please add the last gap of m+1an. Your expectation plot matches my Monte Carlo simulation.
AaronBecker
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.