有界随机变量的方差


22

假设随机变量具有上限和下限[0,1]。如何计算这样一个变量的方差?


8
与无界变量相同的方式-适当设置积分或求和限制。
Scortchi-恢复莫妮卡

2
正如@Scortchi所说。但是我很好奇你为什么认为可能会有所不同?
彼得·弗洛姆

3
除非您对变量一无所知(在这种情况下,可能会根据边界的存在来计算方差的上限),为什么将有界这一事实纳入计算呢?
Glen_b-恢复莫妮卡(Monica)2012年

6
一个有用的上呈现的值在一个随机变量的方差的上限[ b ][a,b]的概率11b - 2 / 4(ba)2/4,并且由发生在值的离散随机变量达到一个abb具有相等概率1212。需要牢记的另一点是,可以保证存在方差,而无边界随机变量可能没有方差(某些变量,例如Cauchy随机变量甚至没有均值)。
Dilip Sarwate 2012年

7
离散随机变量,其方差等于b - 2(ba)24 正好 4一个随机变量,它以相等的概率1接受值aabb212。因此,至少我们知道方差的通用上限不能小于ba24(ba)24
Dilip Sarwate

Answers:


46

您可以证明波波维丘不等式如下。使用符号= INF Xm=infX中号= SUP XM=supX。通过 g t = E [ X - t 2 ]定义函数gg

g(t)=E[(Xt)2].
计算导数 g 'g,并求解 g 't = 2 E [ X ] + 2 t = 0
g(t)=2E[X]+2t=0,
我们发现 gg t = E [ X ]时t=E[X]达到最小值(请注意 g '' > 0g′′>0)。

现在,考虑在特殊点t = M + m处的函数gg的值2t=M+m2。它必须是这样的情况 V- [R[X]=ë[X]中号+2

Var[X]=g(E[X])g(M+m2).
但是 g M + m2=E[X M + m22]=14 E[X-m+X-M2]
g(M+m2)=E[(XM+m2)2]=14E[((Xm)+(XM))2].
由于 X - 0Xm0 X - 中号0XM0,我们有 X - + X - 中号 2X - - X - 中号 2 = 中号- 2
((Xm)+(XM))2((Xm)(XM))2=(Mm)2,
表示 14 ë[X-+X-中号2]14 E[X-m-X-M2]=M-m24
14E[((Xm)+(XM))2]14E[((Xm)(XM))2]=(Mm)24.
因此,我们证明了Popoviciu不等式 V a r [ X ] M - m 24
Var[X](Mm)24.


3
好的方法:很好地看到这些事情的生动演示。
whuber

22
+1好!我早在计算机兴起之前就已经掌握了统计信息,而深入研究我们的一个想法是E [ X - t 2 ] = E [ X - μ - t - μ 2 ] = E [ X - μ 2 ] + t - μ 2
E[(Xt)2]=E[((Xμ)(tμ))2]=E[(Xμ)2]+(tμ)2
通过找到与任何方便点t的偏差的平方和,然后调整偏差,可以计算方差。这里当然,这身份给出结果的一个简单的证明具有最小值= μ而不衍生物等的必要性tg(t)t=μ
迪利普Sarwate

18

˚F是对分布[ 0 1 ]。我们将证明,如果方差˚F是最大的,那么˚F可以有没有在内部支持,从中可以得出˚F是伯努利,其余是微不足道的。F[0,1]FFF

作为符号的问题,让μ ķ = 1 0 X ķ d ˚F X ķ个的生力矩˚F(和往常一样,我们写μ = μ 1σ 2 = μ 2 - μ 2为差异)。μk=10xkdF(x)kFμ=μ1σ2=μ2μ2

我们知道F在某一时刻并没有获得所有支持(在这种情况下,方差很小)。除其他外,这意味着μ严格位于01之间。为了通过矛盾争辩,假设有一些可测量的子集在内部0 1 为其˚F > 0。不失一般性的任何损失我们可以假定(通过改变X1 - X如果需要的话),该˚F Ĵ = Fμ01I(0,1)F(I)>0X1X0 μ ] > 0:换句话说,通过切除均值以上的 I的任何部分来获得 J,并且 J具有正概率。F(J=I(0,μ])>0JIJ

让我们改变˚F˚F '采取一切概率出Ĵ并将其放置在0FFJ0 在这样做时,μ ķ变为μk

μ ' ķ = μ ķ - Ĵ X ķ d ˚F X

μk=μkJxkdF(x).

作为符号的问题,让我们写出[ X ] = ĴX d ˚F X 这样的积分,从那里[g(x)]=Jg(x)dF(x)

μ ' 2 = μ 2 - [ X 2 ] μ ' = μ - [ X ]

μ2=μ2[x2],μ=μ[x].

计算

σ ' 2 = μ ' 2 - μ ' 2 = μ 2 - [ X 2 ] - μ - [ X ] 2 = σ 2 + μ [ X ] - [ X 2 ] + μ [ X ] [ x ] 2

σ2=μ2μ2=μ2[x2](μ[x])2=σ2+((μ[x][x2])+(μ[x][x]2)).

The second term on the right, (μ[x][x]2)(μ[x][x]2), is non-negative because μxμx everywhere on JJ. The first term on the right can be rewritten

μ[x][x2]=μ(1[1])+([μ][x][x2]).

μ[x][x2]=μ(1[1])+([μ][x][x2]).

The first term on the right is strictly positive because (a) μ>0μ>0 and (b) [1]=F(J)<1[1]=F(J)<1 because we assumed FF is not concentrated at a point. The second term is non-negative because it can be rewritten as [(μx)(x)][(μx)(x)] and this integrand is nonnegative from the assumptions μxμx on JJ and 0x10x1. It follows that σ2σ2>0σ2σ2>0.

We have just shown that under our assumptions, changing FF to FF strictly increases its variance. The only way this cannot happen, then, is when all the probability of FF is concentrated at the endpoints 00 and 11, with (say) values 1p1p and pp, respectively. Its variance is easily calculated to equal p(1p)p(1p) which is maximal when p=1/2p=1/2 and equals 1/41/4 there.

Now when FF is a distribution on [a,b][a,b], we recenter and rescale it to a distribution on [0,1][0,1]. The recentering does not change the variance whereas the rescaling divides it by (ba)2(ba)2. Thus an FF with maximal variance on [a,b][a,b] corresponds to the distribution with maximal variance on [0,1][0,1]: it therefore is a Bernoulli(1/2)(1/2) distribution rescaled and translated to [a,b][a,b] having variance (ba)2/4(ba)2/4, QED.


Interesting, whuber. I didn't know this proof.
Zen

6
@Zen It's by no means as elegant as yours. I offered it because I have found myself over the years thinking in this way when confronted with much more complicated distributional inequalities: I ask how the probability can be shifted around in order to make the inequality more extreme. As an intuitive heuristic it's useful. By using approaches like the one laid out here, I suspect a general theory for proving a large class of such inequalities could be derived, with a kind of hybrid flavor of the Calculus of Variations and (finite dimensional) Lagrange multiplier techniques.
whuber

Perfect: your answer is important because it describes a more general technique that can be used to handle many other cases.
Zen

@whuber said - "I ask how the probability can be shifted around in order to make the inequality more extreme." -- this seems to be the natural way to think about such problems.
Glen_b -Reinstate Monica

There appear to be a few mistakes in the derivation. It should be μ[x][x2]=μ(1[1])[x]+([μ][x][x2]).
μ[x][x2]=μ(1[1])[x]+([μ][x][x2]).
Also, [(μx)(x)][(μx)(x)] does not equal [μ][x][x2][μ][x][x2] since [μ][x][μ][x] is not the same as μ[x]μ[x]
Leo

13

If the random variable is restricted to [a,b][a,b] and we know the mean μ=E[X]μ=E[X], the variance is bounded by (bμ)(μa)(bμ)(μa).

Let us first consider the case a=0,b=1a=0,b=1. Note that for all x[0,1]x[0,1], x2xx2x, wherefore also E[X2]E[X]E[X2]E[X]. Using this result, σ2=E[X2](E[X]2)=E[X2]μ2μμ2=μ(1μ).

σ2=E[X2](E[X]2)=E[X2]μ2μμ2=μ(1μ).

To generalize to intervals [a,b][a,b] with b>ab>a, consider YY restricted to [a,b][a,b]. Define X=YabaX=Yaba, which is restricted in [0,1][0,1]. Equivalently, Y=(ba)X+aY=(ba)X+a, and thus Var[Y]=(ba)2Var[X](ba)2μX(1μX).

Var[Y]=(ba)2Var[X](ba)2μX(1μX).
where the inequality is based on the first result. Now, by substituting μX=μYabaμX=μYaba, the bound equals (ba)2μYaba(1μYaba)=(ba)2μYababμYba=(μYa)(bμY),
(ba)2μYaba(1μYaba)=(ba)2μYababμYba=(μYa)(bμY),
which is the desired result.

8

At @user603's request....

A useful upper bound on the variance σ2σ2 of a random variable that takes on values in [a,b][a,b] with probability 11 is σ2(ba)24σ2(ba)24. A proof for the special case a=0,b=1a=0,b=1 (which is what the OP asked about) can be found here on math.SE, and it is easily adapted to the more general case. As noted in my comment above and also in the answer referenced herein, a discrete random variable that takes on values aa and bb with equal probability 1212 has variance (ba)24(ba)24 and thus no tighter general bound can be found.

Another point to keep in mind is that a bounded random variable has finite variance, whereas for an unbounded random variable, the variance might not be finite, and in some cases might not even be definable. For example, the mean cannot be defined for Cauchy random variables, and so one cannot define the variance (as the expectation of the squared deviation from the mean).


this is a special case of @Juho's answer
Aksakal

It was just a comment, but I could also add that this answer does not answer the question asked.
Aksakal

@Aksakal So??? Juho was answering a slightly different and much more recently asked question. This new question has been merged with the one you see above, which I answered ten months ago.
Dilip Sarwate

0

are you sure that this is true in general - for continuous as well as discrete distributions? Can you provide a link to the other pages? For a general distibution on [a,b][a,b] it is trivial to show that Var(X)=E[(XE[X])2]E[(ba)2]=(ba)2.

Var(X)=E[(XE[X])2]E[(ba)2]=(ba)2.
I can imagine that sharper inequalities exist ... Do you need the factor 1/41/4 for your result?

On the other hand one can find it with the factor 1/41/4 under the name Popoviciu's_inequality on wikipedia.

This article looks better than the wikipedia article ...

For a uniform distribution it holds that Var(X)=(ba)212.

Var(X)=(ba)212.

This page states the result with the start of a proof that gets a bit too involved for me as it seems to require an understanding of the "Fundamental Theorem of Linear Programming". sci.tech-archive.net/Archive/sci.math/2008-06/msg01239.html
Adam Russell

Thank you for putting a name to this! "Popoviciu's Inequality" is just what I needed.
Adam Russell

2
This answer makes some incorrect suggestions: 1/4 is indeed right. The reference to Popoviciu's inequality will work, but strictly speaking it applies only to distributions with finite support (in particular, that includes no continuous distributions). A limiting argument would do the trick, but something extra is needed here.
whuber

2
A continuous distribution can approach a discrete one (in cdf terms) arbitrarily closely (e.g. construct a continuous density from a given discrete one by placing a little Beta(4,4)-shaped kernel centered at each mass point - of the appropriate area - and let the standard deviation of each such kernel shrink toward zero while keeping its area constant). Such discrete bounds as discussed here will thereby also act as bounds on continuous distributions. I expect you're thinking about continuous unimodal distributions... which indeed have different upper bounds.
Glen_b -Reinstate Monica

2
Well ... my answer was the least helpful but I would leave it here due to the nice comments. Cheers,R
Ric
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.