如果

14

题

如果 $X_1,\cdots,X_n \sim \mathcal{N}(\mu, 1)$ 是IID，则计算 $\mathbb{E}\left( X_1 \mid T \right)$ ，其中 $T = \sum_i X_i$ 。

尝试：请检查以下是否正确。

让我们说，我们采取的这些条件期望使得总和

\begin{aligned} \sum_{i} E (X_{i} ∣ T) = E (\sum_{i} X_{i} ∣ T) = T . \end{aligned}

$\begin{align} \sum_i \mathbb{E}\left( X_i \mid T \right) = \mathbb{E}\left( \sum_i X_i \mid T \right) = T . \end{align}$ 这意味着每个

E (X_{i} ∣ T) = \frac{T}{n}

$\mathbb{E}\left( X_i \mid T \right) = \frac{T}{n}$ 因为

X_{1}, \dots, X_{n}

$X_1,\ldots,X_n$ 是IID。

因此， $\mathbb{E}\left( X_1 \mid T \right) = \frac{T}{n}$ 。这是对的吗？

— 学习
source

2

的

X_{i}

$X_i$ 的不独立同分布的条件上

T

$T$ 但具有可更换的联合分布。这意味着他们的条件期望都相等（等于

T / n

$T/n$ ）。

— Jarle Tufto

@JarleTufto：“可交换的联合分配”是什么意思？

X_{i}

$X_i$ 和

T

$T$ ？的共同分布

— 学习

2

这意味着的联合分布

X_{1}, X_{2}, X_{3}

$X_1,X_2,X_3$ 是相同的

X_{2}, X_{3}, X_{1}

$X_2,X_3,X_1$ （和所有其它的排列）。请参阅en.wikipedia.org/wiki/Exchangeable_random_variables。或查看@whuber的答案！

— Jarle Tufto

2

值得注意的是，答案与

的分布无关

X_{1}, \dots, X_{n}

$X_1,\ldots,X_n$ 。

— StubbornAtom

11

这个想法是正确的-但还有一个问题需要更严格地表达。因此，我将专注于表示法和概念的本质。

让我们从可交换性的想法开始：

A random variable $\mathbf X=(X_1, X_2, \ldots, X_n)$ is exchangeable when the distributions of the permuted variables $\mathbf{X}^\sigma=(X_{\sigma(1)}, X_{\sigma(2)}, \ldots, X_{\sigma(n)})$ are all the same for every possible permutation $\sigma$ .

Clearly iid implies exchangeable.

As a matter of notation, write $X^\sigma_i = X_{\sigma(i)}$ for the $i^\text{th}$ component of $\mathbf{X}^\sigma$ and let

T^{σ} = \sum_{i = 1}^{n} X_{i}^{σ} = \sum_{i = 1}^{n} X_{i} = T .

$T^\sigma = \sum_{i=1}^n X^\sigma_i = \sum_{i=1}^n X_i = T.$

Let $j$ be any index and let $\sigma$ be any permutation of the indices that sends $1$ to $j = \sigma(1).$ (Such a $\sigma$ exists because one can always just swap $1$ and $j.$ ) Exchangeability of $\mathbf X$ implies

E [X_{1} ∣ T] = E [X_{1}^{σ} ∣ T^{σ}] = E [X_{j} ∣ T],

$E[X_1\mid T] = E[X^\sigma_1\mid T^\sigma] = E[X_j\mid T],$

because (in the first inequality) we have merely replaced $\mathbf X$ by the identically distributed vector $\mathbf X^\sigma.$ This is the crux of the matter.

Consequently

T = E [T ∣ T] = E [\sum_{i = 1}^{n} X_{i} ∣ T] = \sum_{i = 1}^{n} E [X_{i} ∣ T] = \sum_{i = 1}^{n} E [X_{1} ∣ T] = n E [X_{1} ∣ T],

$T = E[T \mid T] = E[\sum_{i=1}^n X_i\mid T] = \sum_{i=1}^n E[X_i\mid T] = \sum_{i=1}^n E[X_1\mid T] = n E[X_1 \mid T],$

whence

E [X_{1} ∣ T] = \frac{1}{n} T .

$E[X_1\mid T] = \frac{1}{n} T.$

— whuber
source

4

$\newcommand{\one}{\mathbf 1}$ This is not a proof (and +1 to @whuber's answer), but it's a geometric way to build some intuition as to why $E(X_1 | T) = T/n$ is a sensible answer.

$X = (X_1,\dots,X_n)^T$ $\one = (1,\dots,1)^T$ $T = \one^TX$ . We're then conditioning on the event that $\one^TX = t$ for some $t \in \mathbb R$ , so this is like drawing multivariate Gaussians supported on $\mathbb R^n$ but only looking at the ones that end up in the affine space $\{x \in \mathbb R^n : \one^Tx = t\}$ . Then we want to know the average of the $x_1$ coordinates of the points that land in this affine space (never mind that it's a measure zero subset).

我们知道

X \sim N (μ 1, I)

$X \sim \mathcal N(\mu \one, I)$ so we've got a spherical Gaussian with a constant mean vector, and the mean vector

μ 1

$\mu\one$ is on the same line as the normal vector of the hyperplane

x^{T} 1 = 0

$x^T\one = 0$ .

This gives us a situation like the picture below:

$H_t := \{x : x^T\one = t\}$ $X$ $x_1 = x_2$ $E(X) \in \text{span } \one$ $H_t$ $H_t$ is also symmetric over the same line, and the point around which it is symmetric is the intersection of the lines $x_1 + x_2 = t$ and $x_1 = x_2$ . This happens for $x = (t/2, t/2)$ .

To picture $E(X_1 | T)$ we can imagine sampling over and over, and then whenever we get a point in $H_t$ we take just the $x_1$ coordinate and save that. From the symmetry of the density on $H_t$ the distribution of the $x_1$ coordinates will also be symmetric, and it'll have the same center point of $t/2$ . The mean of a symmetric distribution is the central point of symmetry so this means $E(X_1 | T) = T/2$ , and that $E(X_1| T) = E(X_2 | T)$ since $X_1$ and $X_2$ can be excahnged without affecting anything.

In higher dimensions this gets hard (or impossible) to exactly visualize, but the same idea applies: we've got a spherical Gaussian with a mean in the span of $\one$ , and we're looking at an affine subspace that's perpendicular to that. The balance point of the distribution on the subspace will still be the intersection of $\text{span }\one$ and $\{x : x^T\one = t\}$ which is at $x=(t/n, \dots, t/n)$ , and the density is still symmetric so this balance point is again the mean.

Again, that's not a proof, but I think it gives a decent idea of why you'd expect this behavior in the first place.

Beyond this, as some such as @StubbornAtom have noted, this doesn't actually require $X$ to be Gaussian. In 2-D, note that if $X$ is exchangeable then $f(x_1, x_2) = f(x_2, x_1)$ (more generally, $f(x) = f(x^\sigma)$ ) so $f$ must be symmetric over the line $x_1 = x_2$ . We also have $E(X) \in \text{span }\one$ so everything I said regarding the "key idea" in the first picture still exactly holds. Here's an example where the $X_i$ are iid from a Gaussian mixture model. All the lines have the same meaning as before.

— jld
source

1

I think your answer is right, although I'm not entirely sure about the killer line in your proof, about it being true "because they are i.i.d". A more wordy way to the same solution is as follows:

Think about what $\mathbb{E}(x_{i}|T)$ actually means. You know that you have a sample with N readings and that their mean is T. What this actually means, is that now, the underlying distribution they were sampled from no longer matters (you'll notice you at no point used the fact it was sampled from a Gaussian in your proof).

$\mathbb{E}(x_{i}|T)$ is the answer to the question, if you sampled from your sample, with replacement many times, what would be the average you obtained. This is the sum over all the possible values, multiplied by their probability, or $\sum_{i=1}^{N}\frac{1}{N}x_{i}$ which equals T.

— gazza89
source

1

Note that the

x_{i} | T

$x_i|T$ can't be i.i.d., as they are constrained to sum to

T

$T$ . If you know

n - 1

$n-1$ of them, you know the

n^{t h}

$n^{th}$ one too.

— jbowman

yes, but I did something more subtle, I said if you sampled multiple times with replacement, each sample would be an i.i.d sample from a discrete distribution.

— gazza89

Sorry! Misplaced the comment, it should have been to the OP. It was meant in reference to the statement "It means that each

E (X_{i} ∣ T) = \frac{T}{n}

$\mathbb{E}\left( X_i \mid T \right) = \frac{T}{n}$ since

X_{1}, \dots, X_{n}

$X_1,\ldots,X_n$ are IID."

— jbowman