MLE是否需要iid数据?还是只是独立的参数?


16

使用最大似然估计(MLE)估计参数涉及评估似然函数,该函数将样本(X)出现的概率映射为给定分布族(P(X = x |θ )超过θ的可能值(请注意:我对吗?)我看到的所有示例都涉及通过取F(X)的乘积来计算P(X = x |θ),其中F是局部分布θ和X的值是样本(向量)。

由于我们只是在乘以数据,因此数据是否独立?例如,我们不能使用MLE拟合时间序列数据吗?还是参数必须独立?

Answers:


14

似然函数定义为事件E(数据集x)作为模型参数θ的函数的概率θ

L(θ;x)P(Event E;θ)=P(observing x;θ).

因此,没有假设观察的独立性。在经典方法中,没有定义参数的独立性,因为它们不是随机变量。一些相关的概念可能是可识别性,参数正交性和最大似然估计器的独立性(它们是随机变量)。

一些例子,

(1)。离散案例是(独立的)离散的观察与样品P观察  X Ĵ ; θ > 0,则x=(x1,...,xn)P(observing xj;θ)>0

L(θ;x)j=1nP(observing xj;θ).

特别是,如果,具有Ñ已知的,我们有xjBinomial(N,θ)N

L(θ;x)j=1nθxj(1θ)Nxj.

(2)。连续逼近。让是由连续的随机变量的样本X,与分配˚F和密度˚F,具有测量误差ε,这是,你观察集X Ĵ - ε X j + ϵ 。然后x=(x1,...,xn)XFfϵ(xjϵ,xj+ϵ)

L(θ;x)j=1nP[observing (xjϵ,xj+ϵ);θ]=j=1n[F(xj+ϵ;θ)F(xjϵ;θ)]

小时,可以通过以下方式近似(使用均值定理)ϵ

L(θ;x)j=1nf(xj;θ)

对于与正常情况下的例子,来看看这个

(3)。相依和马尔可夫模型。假设是一组可能的依赖性和让观测˚F是联合密度X,然后x=(x1,...,xn)fx

L(θ;x)f(x;θ).

如果另外满足Markov属性,则

L(θ;x)f(x;θ)=f(x1;θ)j=1n1f(xj+1|xj;θ).

也看看这个


3
从中,您将似然函数写为乘积,就隐含地假设了观察值之间的依存关系。因此,对于MLE,需要两个假设(a)一个关于每个结果的分布的假设,(b)一个关于结果之间的依赖性的假设。

10

(+1)很好的问题。

次要方面,MLE表示最大似然估计(不是倍数),这意味着您只是使似然最大化。这并不表示必须通过IID采样来产生可能性。

如果可以在统计模型中写出抽样的依存关系,则只需相应地写出可能性,然后像往常一样将其最大化。

当您假设依赖关系时,值得一提的是多元高斯抽样(例如在时间序列分析中)。两个高斯变量之间的依存关系可以通过它们的协方差项来建模,您可以在概率上不相称。

举一个简单的例子,假设您从均值和方差相同的相关高斯变量中提取了大小为的样本。您将可能性写为2

12πσ21ρ2exp(z2σ2(1ρ2)),

其中z

z=(x1μ)22ρ(x1μ)(x2μ)+(x2μ)2.

This is not the product of the individual likelihoods. Still, you would maximize this with parameters (μ,σ,ρ) to get their MLE.


2
These are good answers and examples. The only thing I would add to see this in simple terms is that likelihood estimation only requires that a model for the generation of the data be specified in terms of some unknown parameters be described in functional form.
Michael R. Chernick

(+1) Absolutely true! Do you have an example of model that cannot be specified in those terms?
gui11aume

@gu11aume I think you are referring to my remark. I would say that I was not giving a direct answer to the question. The answwer to the question is yes because there are examples that can be shown where the likelihood function can be expressed when the data are genersted by dependent random variables.
Michael R. Chernick

2
Examples where this cannot be done would be where the data are given without any description of the data generating mechanism or the model is not presented in a parametric form such as when you are given two iid data sets and are asked to test whether they come from the same distribution where you only specify that the distributions are absolutely continuous.
Michael R. Chernick

4

Of course, Gaussian ARMA models possess a likelihood, as their covariance function can be derived explicitly. This is basically an extension of gui11ame's answer to more than 2 observations. Minimal googling produces papers like this one where the likelihood is given in the general form.

Another, to an extent, more intriguing, class of examples is given by multilevel random effect models. If you have data of the form

yij=xijβ+ui+ϵij,
where indices j are nested in i (think of students j in classrooms i, say, for a classic application of multilevel models), then, assuming ϵijui, the likelihood is
lnLilnjf(yij|β,ui)dF(ui)
and is a sum over the likelihood contributions defined at the level of clusters, not individual observations. (Of course, in the Gaussian case, you can push the integrals around to produce an analytic ANOVA-like solution. However, if you have say a logit model for your response yij, then there is no way out of numerical integration.)

2
Stask and @gui11aume, these three answers are nice but I think they miss a point: what about the consistency of the MLE for dependent data ?
Stéphane Laurent
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.