偏差-方差分解的更多步骤
确实,在教科书中很少给出完整的推导,因为它涉及许多无启发的代数。这是使用第223页的“统计学习的要素”一书中的符号进行的更完整推导
如果我们假设Y=f(X)+ϵ和E[ϵ]=0和Var(ϵ)=σ2ϵ然后我们可以推导出用于回归拟合的预期预测误差的表达式˚F(X )使用平方误差损失在输入X = x 0处f^(X)X=x0
Err(x0)=E[(Y−f^(x0))2|X=x0]
为了标记简单起见设˚F(X 0)= ˚F,˚F (X 0)= ˚F和回想ë [ ˚F ] = ˚F和ë [ ÿ ] = ˚Ff^(x0)=f^f(x0)=fE[f]=fE[Y]=f
E[(Y−f^)2]=E[(Y−f+f−f^)2]=E[(y−f)2]+E[(f−f^)2]+2E[(f−f^)(y−f)]=E[(f+ϵ−f)2]+E[(f−f^)2]+2E[fY−f2−f^Y+f^f]=E[ϵ2]+E[(f−f^)2]+2(f2−f2−fE[f^]+fE[f^])=σ2ϵ+E[(f−f^)2]+0
For the term E[(f−f^)2] we can use a similar trick as above, adding and subtracting E[f^] to get
E[(f−f^)2]=E[(f+E[f^]−E[f^]−f^)2]=E[f−E[f^]]2+E[f^−E[f^]]2=[f−E[f^]]2+E[f^−E[f^]]2=Bias2[f^]+Var[f^]
Putting it together
E[(Y−f^)2]=σ2ϵ+Bias2[f^]+Var[f^]
Some comments on why E[f^Y]=fE[f^]
Taken from Alecos Papadopoulos here
Recall that f^ is the predictor we have constructed based on the m data points {(x(1),y(1)),...,(x(m),y(m))} so we can write f^=f^m to remember that.
On the other hand Y is the prediction we are making on a new data point (x(m+1),y(m+1)) by using the model constructed on the m data points above. So the Mean Squared Error can be written as
E[f^m(x(m+1))−y(m+1)]2
Expanding the equation from the previous section
E[f^mY]=E[f^m(f+ϵ)]=E[f^mf+f^mϵ]=E[f^mf]+E[f^mϵ]
The last part of the equation can be viewed as
E[f^m(x(m+1))⋅ϵ(m+1)]=0
Since we make the following assumptions about the point x(m+1):
- It was not used when constructing f^m
- It is independent of all other observations {(x(1),y(1)),...,(x(m),y(m))}
- It is independent of ϵ(m+1)
Other sources with full derivations