对于什么模型,MLE的偏差下降快于方差?


14

θ^θnˆθθθ^θO(1/n)O(1/n)EˆθθEθ^θEˆθˆθEθ^θ^O(1/n)O(1/n)

我对具有比更快地收缩的偏差的模型感兴趣,但是其中的误差不会以这种更快的速率收缩,因为偏差仍以收缩。特别是,我想知道足够的条件来使模型的偏差以的速率收缩。O(1/n)O(1/n)O(1/n)O(1/n)O(1/n)O(1/n)


确实?要么?ˆθθ=(ˆθθ)2θ^θ=(θ^θ)2
Alecos Papadopoulos

我是专门询问L2规范的,是的。但是如果它使问题更易于回答,我也会对其他规范感兴趣。
Mike Izbicki

θ - θ * 2 Ô p1 / Ñ (θ^θ)2是。Op(1/n)
Alecos Papadopoulos

抱歉,我看不懂您的评论。对于维的L2范数,,因此收敛速度为。我同意,如果我们将其平方,那么它将收敛为。d 一个- b | | = dd i = 1a ib i 2 O1/ab=di=1(aibi)2nO1/nO(1/n)O(1/n)
Mike Izbicki

您是否看到过岭回归(Hoerl&Kennard 1970)的论文?我相信它给出了设计矩阵+罚金的条件,而这是正确的。
dcl

Answers:


5

通常,您需要的模型中MLE并非渐近正态,而是会收敛到其他某种分布(并且收敛速度更快)。当被估计的参数位于参数空间的边界时,通常会发生这种情况。直观上,这意味着MLE将“仅从一侧”接近参数,因此它“提高了收敛速度”,因为它不会因在参数周围“来回”移动而“分散注意力”。

一个标准的例子,是用于MLE θθ的IID样品中ù 0 θ U(0,θ)一致的RV的The MLE这里是最大阶统计量,

θ Ñ=ÙÑ

θ^n=u(n)

其有限样本分布为

˚F θ Ñ = θ Ñ ñθ Ñ˚F θ = Ñ θ Ñ ñ - 1θ ñ

Fθ^n=(θ^n)nθn,fθ^=n(θ^n)n1θn

Èθ Ñ= Ñn + 1个 θθ= - 1n + 1个 θ

E(θ^n)=nn+1θB(θ^)=1n+1θ

所以θ Ñ= Ö 1 / Ñ 。但是,相同的增长率也适用于方差。B(θ^n)=O(1/n)

你也可以验证,以获得极限分布,我们需要看看变量ñ θ - θ ñ,(即我们需要通过规模ñ),因为n(θθ^n)n

P [ Ñ θ - θ ÑŽ ] = 1 - P [ θ Ñθ - Ž / Ñ ]

P[n(θθ^n)z]=1P[θ^nθ(z/n)]

= 1 1θ Ñθ+-žÑñ=1-θÑθ Ñ1个+-ž / θnn

=11θn(θ+zn)n=1θnθn(1+z/θn)n

1ez/θ

1ez/θ

which is the CDF of the Exponential distribution.

I hope this provides some direction.


This is getting close, but I'm specifically interested in situations where the bias shrinks faster than the variance.
Mike Izbicki

2
@MikeIzbicki Hmm ...偏差收敛取决于分布的一阶矩,(平方根)方差也是“一阶”量值。那时我不确定这是否有可能发生,因为看来这意味着极限分布的时刻会以彼此不兼容的收敛速度“上升”……我会考虑一下。
Alecos Papadopoulos

2

在我其他答案中的评论之后(并再次查看OP的标题!),这是对该问题的不太严格的理论探索。

We want to determine whether Bias B(ˆθn)=E(ˆθn)θB(θ^n)=E(θ^n)θ may have different convergence rate than the square root of the Variance,

B(ˆθn)=O(1/nδ),Var(ˆθn)=O(1/nγ),γδ???

B(θ^n)=O(1/nδ),Var(θ^n)=O(1/nγ),γδ???

We have

B(ˆθn)=O(1/nδ)limnδE(ˆθn)<Klimn2δ[E(ˆθn)]2<K

B(θ^n)=O(1/nδ)limnδE(θ^n)<Klimn2δ[E(θ^n)]2<K

[E(ˆθn)]2=O(1/n2δ)

[E(θ^n)]2=O(1/n2δ)(1)

while

Var(ˆθn)=O(1/nγ)limnγE(ˆθ2n)[E(ˆθn)]2<M

Var(θ^n)=O(1/nγ)limnγE(θ^2n)[E(θ^n)]2<M

limn2γE(ˆθ2n)n2γ[E(ˆθn)]2<M

limn2γE(θ^2n)n2γ[E(θ^n)]2<M

limn2γE(ˆθ2n)limn2γ[E(ˆθn)]2<M

limn2γE(θ^2n)limn2γ[E(θ^n)]2<M(2)

We see that (2)(2) may hold happen if

A) both components are O(1/n2γ)O(1/n2γ), in which case we can only have γ=δγ=δ.

B) But it may also hold if

limn2γ[E(ˆθn)]20[E(ˆθn)]2=o(1/n2γ)

limn2γ[E(θ^n)]20[E(θ^n)]2=o(1/n2γ)(3)

For (3) to be compatible with (1), we must have

n2γ<n2δδ>γ

So it appears that in principle it is possible to have the Bias converging at a faster rate than the square root of the variance. But we cannot have the square root of the variance converging at a faster rate than the Bias.


How would you reconcile this with the existence of unbiased estimators like ordinary least squares? In that case, B(ˆθ)=0, but Var(ˆθ)=O(1/n).
Mike Izbicki

@MikeIzbicki Is the concept of convergence/big-O applicable in this case? Because here B(ˆθ) is not "O()-anything" to begin with.
Alecos Papadopoulos

In this case, Eˆθ=θ, so B(ˆθ)=Eˆθθ=0=O(1)=O(1/n0).
Mike Izbicki

@MikeIzbicki But also B(ˆθ)=O(n) or B(ˆθ)=O(1/n) or any other you care to write down. So which one is the rate of convergence here?
Alecos Papadopoulos

@MikeIzbicki I have corrected my answer to show that it is possible in principle to have the Bias converging faster, although I still think the "zero-bias" example is problematic.
Alecos Papadopoulos
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.