预期的预测误差-推导


20

我正在努力理解低于预期(ESL)的预期预测误差的推导,尤其是在2.11和2.12的推导上(条件,即逐步达到最小点)。任何指针或链接,不胜感激。

我在下面报告ESL pg的摘录。18.前两个公式按顺序是公式2.11和2.12。


XRp分别表示实值随机输入向量,并YR实值随机输出变量,与联合分布Pr(X,Y)。我们追求的是功能f(X)预测Y输入的给定值X。该理论要求损失函数 L(Y,f(X))用于惩罚预测误差,到目前为止,最常见和最方便的方法是平方误差损失L(Y,f(X))=(Yf(X))2。这使我们得出选择f的标准,

EPE(f)=E(Yf(X))2=[yf(x)]2Pr(dx,dy)

预期(平方)的预测误差。通过以X条件,我们可以将EPE编写为

EPE(f)=EXEY|X([Yf(X)]2|X)

并且我们看到足以将EPE逐点最小化:

f(x)=argmincEY|X([Yc]2|X)

解决方法是

f(x)=E(Y|X=x)

条件期望,也称为回归函数。


在Wikipedia上有关总期望法则的第一个方程式中交换Y得出(2.9)和(2.11)的等价关系。阅读该文章以获取证明。(2.12)是即时的,因为要选择f以最小化EPE。XYf
ub


2
对于那些也正在阅读本书的人,请查看Weathermax和Epstein
撰写的

@Dodgie该链接已消失:(
Matthew Drury

2
@MatthewDrury幸运的是,“ Weathermax和Epstein统计信息”的搜索返回了链接,这是第一个结果;)- waxworksmath.com/Authors/G_M/Hastie/WriteUp/…–
Dodgie

Answers:


16

EPE(f)=[yf(x)]2Pr(dx,dy)=[yf(x)]2p(x,y)dxdy=xy[yf(x)]2p(x,y)dxdy=xy[yf(x)]2p(x)p(y|x)dxdy=x(y[yf(x)]2p(y|x)dy)p(x)dx=x(EY|X([Yf(X)]2|X=x))p(x)dx=EXEY|X([Yf(X)]2|X=x)

3
我了解您的写意,但是您认为如果OP被问题中显示的推导所混淆,他/她将理解您的答案?当然,我已经理解了问题中显示的推导。
马克·L·斯通

我从Google带着同样的问题来到这里,实际上发现这个推导正是我所需要的。
分号和胶带2016年

1
@ MarkL.Stone-这可能是一个愚蠢的问题,但是您能否解释含义以及它如何变成p x y d x d y?感谢一大堆Pr(dx,dy)p(x,y)dxdy
Xavier Bourret Sicotte,

1
前者是后者。我认为使用dP(x,y)或dF(x,y)更为常见。在1D模式中,您经常会看到dF(x)表示f(x)dx,其中f(x)是概率密度函数,但是这种符号也可以考虑离散的概率质量函数(求和),甚至可以混合使用连续密度和离散概率质量。
马克·L·斯通

说(最后一个公式)更精确吗?EX(EY|X([Yf(X)]2|X=x))
D1X

11

公式(2.11)是以下等式的结果。对于任何两个随机变量Z 2以及任何函数gZ1Z2g

EZ1,Z2(g(Z1,Z2))=EZ2(EZ1Z2(g(Z1,Z2)Z2))

The notation EZ1,Z2 is the expectation over the joint distribution. The notation EZ1Z2 essentially says "integrate over the conditional distribution of Z1 as if Z2 was fixed".

It's easy to verify this in the case that Z1 and Z2 are discrete random variables by just unwinding the definitions involved

EZ2(EZ1Z2(g(Z1,Z2)Z2))=EZ2(z1g(z1,Z2)Pr(Z1=z1Z2))=z2(z1g(z1,z2)Pr(Z1=z1Z2=z2))Pr(Z2=z2)=z1,z2g(z1,z2)Pr(Z1=z1Z2=z2)Pr(Z2=z2)=z1,z2g(z1,z2)Pr(Z1=z1,Z2=z2)=EZ1,Z2(g(Z1,Z2))

The continuous case can either be viewed informally as a limit of this argument, or formally verified once all the measure theoretic do-dads are in place.

To unwind the application, take Z1=Y, Z2=X, and g(x,y)=(yf(x))2. Everything lines up exactly.

The assertion (2.12) asks us to consider minimizing

EXEYX(Yf(X))2

where we are free to choose f as we wish. Again, focusing on the discrete case, and dropping halfway into the unwinding above, we see that we are minimizing

x(y(yf(x))2Pr(Y=yX=x))Pr(X=x)

Everything inside the big parenthesis is non-negative, and you can minimize a sum of non-negative quantities by minimizing the summands individually. In context, this means that we can choose f to minimize

y(yf(x))2Pr(Y=yX=x)

individually for each discrete value of x. This is exactly the content of what ESL is claiming, only with fancier notation.


8

I find some parts in this book express in a way that is difficult to understand, especially for those who do not have a strong background in statistics.

I will try to make it simple and hope that you can get rid of confusion.

Claim 1 (Smoothing) E(X)=E(E(X|Y)),X,Y

Proof: Notice that E(Y) is a constant but E(Y|X) is a random variable depending on X.

E(E(X|Y))=E(X|Y=y)fY(y)dy=xfX|Y(x|y)dxfY(y)dy=xfX|Y(x|y)fY(y)dxdy=xfXY(x,y)dxdy=x(fXY(x,y)dy)dx=xfX(x)dx=E(X)

Claim 2: E(Yf(X))2E(YE(Y|X))2,f

Proof:

E((Yf(X))2|X)=E(([YE(Y|X)]+[E(Y|X)f(X)])2|X)=E((YE(Y|X))2|X)+E((E(Y|X)f(X))2|X)+2E((YE(Y|X))(E(Y|X)f(X))|X)=E((YE(Y|X))2|X)+E((E(Y|X)f(X))2|X)+2(E(Y|X)f(X))E(YE(Y|X))|X)( since E(Y|X)f(X) is constant given X)=E((YE(Y|X))2|X)+E((E(Y|X)f(X))2|X) ( use Claim 1 )E((YE(Y|X))2|X)

Taking expectation both sides of the above equation give Claim 2 (Q.E.D)

Therefore, the optimal f is f(X)=E(Y|X)

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.