因变量乘积的方差


31

因变量乘积方差的公式是什么?

对于自变量,公式很简单:

var(XY)=E(X2Y2)E(XY)2=var(X)var(Y)+var(X)E(Y)2+var(Y)E(X)2
但是相关变量的公式是什么?

顺便问一下,如何根据统计数据找到相关性?

Answers:


32

好吧,使用您指出的熟悉的身份,

var(XY)=E(X2Y2)E(XY)2

使用类似的协方差公式,

E(X2Y2)=cov(X2,Y2)+E(X2)E(Y2)

E(XY)2=[cov(X,Y)+E(X)E(Y)]2

这意味着通常,可以写成var(XY)

cov(X2,Y2)+[var(X)+E(X)2][var(Y)+E(Y)2][cov(X,Y)+E(X)E(Y)]2

注意,在独立情况下,,这减少为cov(X2,Y2)=cov(X,Y)=0

[var(X)+E(X)2][var(Y)+E(Y)2][E(X)E(Y)]2

并且两个项抵消了,您得到[E(X)E(Y)]2

var(X)var(Y)+var(X)E(Y)2+var(Y)E(X)2

正如您在上面指出的那样。

编辑:如果您观察到的只是而不是XY,那么我认为您没有办法估算c o vX Y c o vX 2Y 2),除了在特殊情况下(例如,如果X Y具有先验已知均值)XYXYcov(X,Y)cov(X2,Y2)X,Y


2
为什么将[var(X)+ E(X)2]⋅[var(Y)+ E(Y)2]代替E(X2)E(Y2)?

1
@ user35458,因此他可以将等式最终表示为var(X)和var(Y)的表达式,因此可以与OP的语句进行比较。请注意,E(X ^ 2)= Var(X)+ E(X)^ 2
Waldir Leoncio

2
为了响应(离线)对这个答案的有效性的现已删除的挑战,我在许多模拟中将其结果与直接计算乘积的方差进行了比较。如果可以避免的话,这不是一个实用的公式,因为它可能会因取消一个大项而减去另一个大项而失去相当大的精度,但这并不是重点。要提防的一个陷阱是,这个问题与随机变量有关。如果您使用而不是n 1的分母来计算方差和协方差,则nn1 其结果适用于数据(软件通常如此)。
ub

14

这是@Macro非常好的答案的附录,该答案准确列出了确定两个相关随机变量乘积的方差所需知道的内容。由于

(1)var(XY)=E[(XY)2](E[XY])2=E[(XY)2](cov(X,Y)+E[X]E[Y])2(2)=E[X2Y2](cov(X,Y)+E[X]E[Y])2(3)=(cov(X2,Y2)+E[X2]E[Y2])(cov(X,Y)+E[X]E[Y])2
where cov(X,Y), E[X], E[Y], E[X2], and E[Y2] can be assumed to be known quantities, we need to be able to determine the value of E[X2Y2] in (2) or cov(X2,Y2) in (3). This is not easy to do in general, but, as pointed out already, if X and Y are independent random variables, then cov(X,Y)=cov(X2,Y2)=0. In fact, dependence, not correlation (or lack thereof) is the key issue. That we know that cov(X,Y) equals 0 instead of some nonzero value does not, by itself, help in the least in our efforts are determining the value of E[X2Y2] or cov(X2,Y2) even though it does simplify the right sides of (2) and (3) a little.

When X and Y are dependent random variables, then in at least one (fairly common or fairly important) special case, it is possible to find the value of E[X2Y2] relatively easily.

Suppose that X and Y are jointly normal random variables with correlation coefficient ρ. Then, conditioned on X=x, the conditional density of Y is a normal density with mean E[Y]+ρvar(Y)var(X)(xE[X]) and variance var(Y)(1ρ2). Thus,

E[X2Y2X]=X2E[Y2X]=X2[var(Y)(1ρ2)+(E[Y]+ρvar(Y)var(X)(XE[X]))2]
which is a quartic function of X, say g(X), and the Law of Iterated Expectation tells us that
(4)E[X2Y2]=E[E[X2Y2X]]=E[g(X)]
where the right side of (4) can be computed from knowledge of the 3rd and 4th moments of X -- standard results that can be found in many texts and reference books (meaning that I am too lazy to look them up and include them in this answer).

Further addendum: In a now-deleted answer, @Hydrologist gives the variance of XY as

(5)Var[xy]=(E[x])2Var[y]+(E[y])2Var[x]+2E[x]Cov[x,y2]+2E[y]Cov[x2,y]+2E[x]E[y]Cov[x,y]+Cov[x2,y2](Cov[x,y])2
and claims that this formula is from two papers published a half-century ago in JASA. This formula is an incorrect transcription of the results in the paper(s) cited by Hydrologist. Specifically, Cov[x2,y2] is a mistranscription of E[(xE[x])2(yE[y])2] in the journal article, and similarly for Cov[x2,y] and Cov[x,y2].

For the computation of E(X2Y2) in the joint normal case, also see math.stackexchange.com/questions/668641/…
Samuel Reid
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.