指数族：观察到的与期望的足够统计量

我的问题来自阅读Minka的“估计Dirichlet分布”，该陈述在根据随机向量的观察推导Dirichlet分布的最大似然估计的情况下，没有证明以下内容：

与指数族一样，当梯度为零时，期望的足够统计量等于观察到的足够统计量。

我没有看到以这种方式呈现的指数族中的最大似然估计，也没有在搜索中找到任何合适的解释。有人可以提供对观察到的和预期的足够统计量之间的关系的洞察力，也许可以通过最大程度地减少差异来帮助理解最大似然估计？

— 本布雷
source

这是关于指数族的常见主张，但在我看来，大多数时候，它的陈述方式可能会使经验不足的读者感到困惑。因为从表面上看，它可以解释为“如果我们的随机变量遵循指数族的分布，那么如果我们采样并插入到足够的统计量中，我们将获得该统计量的真实期望值 ”。如果真是这样...更多，它还没有考虑到样本的大小，这可能会导致进一步的混乱。

指数密度函数为

\begin{matrix} （1） & F_{X} （ X ） = H （ X ） Ë^{η （ θ ） Ť （ X ）} Ë^{- 一个 （ θ ）} \end{matrix}

$f_X(x) = h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)} \tag{1}$

哪里 $T(x)$ 是足够的统计量。

由于这是一个密度，因此必须整合为一体，因此（ $S_x$ 是...的支持 $X$ ）

\begin{matrix} （2） & \int_{{小号}_{X}} H （ X ） Ë^{η （ θ ） Ť （ X ）} Ë^{- 一个 （ θ ）} d X = 1个 \end{matrix}

$\int_{S_x} h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}dx =1 \tag{2}$

等式 $(2)$ 所有人都持有 $\theta$ 因此我们可以就此区分双方：

\begin{matrix} （3） & \frac{\partial}{\partial θ} \int_{{小号}_{X}} H （ X ） Ë^{η （ θ ） Ť （ X ）} Ë^{- 一个 （ θ ）} d X = \frac{\partial （ 1个 ）}{\partial θ} = 0 \end{matrix}

$\frac {\partial}{\partial \theta} \int_{S_x} h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}dx =\frac {\partial (1)}{\partial \theta} =0 \tag{3}$

互换差异化和整合的顺序，我们得到

\begin{matrix} （4） & \int_{{小号}_{X}} \frac{\partial}{\partial θ} （ H （ X ） Ë^{η （ θ ） Ť （ X ）} Ë^{- 一个 （ θ ）} ） d X = 0 \end{matrix}

$\int_{S_x} \frac {\partial}{\partial \theta} \left(h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}\right)dx =0 \tag{4}$

进行差异化

\begin{matrix} （5） & \frac{\partial}{\partial θ} （ H （ X ） Ë^{η （ θ ） Ť （ X ）} Ë^{- 一个 （ θ ）} ） = F_{X} （ X ） [Ť （ X ） η^{'} （ θ ） - {一个}^{'} （ θ ）] \end{matrix}

$\frac {\partial}{\partial \theta} \left(h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}\right) = f_X(x)\big[T(x)\eta'(\theta) - A'(\theta)\big] \tag{5}$

插入 $(5)$ 进入 $(4)$ 我们得到

\int_{{小号}_{X}} F_{X} （ X ） [Ť （ X ） η^{'} （ θ ） - {一个}^{'} （ θ ）] d X = 0

$\int_{S_x} f_X(x)\big[T(x)\eta'(\theta) - A'(\theta)\big]dx =0$

\begin{matrix} （6） & \Rightarrow η^{'} （ θ ） Ë [Ť （ X ）] - {一个}^{'} （ θ ） = 0 \Rightarrow Ë [Ť （ X ）] = \frac{{一个}^{'} （ θ ）}{η^{'} （ θ ）} \end{matrix}

$\Rightarrow \eta'(\theta)E[T(X)] - A'(\theta) = 0 \Rightarrow E[T(X)] = \frac {A'(\theta)}{\eta'(\theta)} \tag{6}$

现在我们问：的左侧 $(6)$ 是一个实数。因此，右侧也必须是实数，而不是函数。因此，必须在特定的条件下进行评估 $\theta$ ，并且应该是“ true” $\theta$ ，否则在左侧，我们将没有的真实期望值 $T(X)$ 。为了强调这一点，我们通过 $\theta_0$ ，然后我们重写 $(6)$ 如

\begin{matrix} （6a） & Ë_{θ_{0}} [Ť （ X ）] = \frac{{一个}^{'} （ θ ）}{η^{'} （ θ ）} |_{θ = θ_{0}} \end{matrix}

$E_{\theta_0}[T(X)] = \frac {A'(\theta)}{\eta'(\theta)}\Big |_{\theta =\theta_0} \tag{6a}$

现在我们转向最大似然估计。对数样本的对数似然 $n$ 是

L (θ ∣ x) = \sum_{i = 1}^{n} \ln h (x_{i}) + η (θ) \sum_{i = 1}^{n} T (x_{i}) - n A (θ)

$L(\theta \mid \mathbf x) = \sum_{i=1}^n\ln h(x_i) +\eta(\theta)\sum_{i=1}^nT(x_i) -nA(\theta)$

相对于 $\theta$ 等于 $0$ 我们获得了MLE

\begin{matrix} (7) & \hat{θ} (x) : \frac{1}{n} \sum_{i = 1}^{n} T (x_{i}) = \frac{A^{'} (θ)}{η^{'} (θ)} |_{θ = \hat{θ} (x)} \end{matrix}

$\hat \theta(x) : \frac 1n\sum_{i=1}^nT(x_i) = \frac {A'(\theta)}{\eta'(\theta)}\Big |_{\theta =\hat \theta(x)} \tag {7}$

相比 $(7)$ 与 $(6a)$ 。右侧不相等，因为我们不能说MLE估算器触及了真实价值。所以左手边也不是。但是请记住那个等式。 $2$ 所有人都持有 $\theta$ 等等 $\hat \theta$ 也。因此，等式中的步骤。 $3,4,5,6$ 可以考虑到 $\hat \theta$ 这样我们就可以写等式 $6a$ 对于 $\hat \theta$ ：

\begin{matrix} (6b) & E_{\hat{θ} (x)} [T (X)] = \frac{A^{'} (θ)}{η^{'} (θ)} |_{θ = \hat{θ} (x)} \end{matrix}

$E_{\hat\theta(x)}[T(X)] = \frac {A'(\theta)}{\eta'(\theta)}\Big |_{\theta =\hat\theta(x)} \tag{6b}$

结合 $(7)$ ，将我们引向有效关系

E_{\hat{θ} (x)} [T (X)] = \frac{1}{n} \sum_{i = 1}^{n} T (x_{i})

$E_{\hat\theta(x)}[T(X)] = \frac 1n\sum_{i=1}^nT(x_i)$

which is what the assertion under examination really says: the expected value of the sufficient statistic under the MLE for the unknown parameters (in other words, the value of the first raw moment of the distribution that we will obtain if we use $\hat \theta(x)$ in place of $\theta$ ), equals (and it is not just approximated by) the average of the sufficient statistic as calculated from the sample $\mathbf x$ .

Moreover, only if the sample size is $n=1$ then we could accurately say, "the expected value of the sufficient statistic under the MLE equals the sufficient statistic".

— Alecos Papadopoulos
source

您能否进一步说明为什么从6a过渡到6b是有效的？

— Theoden

@Theoden In between eq.

(2)

$(2)$ and

(3)

$(3)$ I write "eq.

(2)

$(2)$ holds for all

θ

$\theta$ " - and therefore for

\hat{θ}

$\hat \theta$ also. So all the steps in eq.

3, 4, 5, 6

$3,4,5,6$ can be taken with respect to

\hat{θ}

$\hat \theta$ 。为了清楚起见，我在文本中重复了这一评论。

— Alecos Papadopoulos

@AlecosPapadopoulos，您的下面证据似乎表明您在一开始就说了什么：“如果我们的随机变量遵循指数族的分布，那么如果我们取样并插入到足够的统计量中，我们将获得真实的期望值统计数据”是正确的。我的意思是，我总是可以为（2）做到这一点，将其替换为观察到的足够统计量并获得结果。我在这里想念什么？我不太明白。

— user10024395 '16

@ user136266 统计信息的真实期望值为

6 a

$6a$ ，为了进行计算，需要通过设计未知来知道参数

θ

$\theta$ 。所以我们实际可以计算的是

6 b

$6b$ 在我们的点估计已经达到真实值的假设下，这是统计量的期望值。

— Alecos Papadopoulos

您能解释一下为什么我们可以在等式中互换微分和积分的顺序吗？（3）好吗？

— Markus777 '17