3个变量的Pearson相关性的类比

17

我对三个变量的“相关性”是否有意义感兴趣，如果是什么，这将是什么？

皮尔逊积矩相关系数

\frac{E {(X - μ_{X}) (Y - μ_{Y})}}{\sqrt{V a r (X) V a r (Y)}}

$\frac{\mathrm{E}\{(X-\mu_X)(Y-\mu_Y)\}}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}$

现在是3个变量的问题：是

\frac{E {(X - μ_{X}) (Y - μ_{Y}) (Z - μ_{Z})}}{\sqrt{V a r (X) V a r (Y) V a r (Z)}}

$\frac{\mathrm{E}\{(X-\mu_X)(Y-\mu_Y)(Z-\mu_Z)\}} {\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)\mathrm{Var}(Z)}}$

有什么事吗

在R中似乎可以解释：

> a <- rnorm(100); b <- rnorm(100); c <- rnorm(100)
> mean((a-mean(a)) * (b-mean(b)) * (c-mean(c))) / (sd(a) * sd(b) * sd(c))
[1] -0.3476942

给定固定的第三个变量的值，我们通常查看2个变量之间的相关性。有人可以澄清吗？

correlation pearson-r

— 帕斯卡
source

2

1）在二元Pearson公式中，如果“ E”（代码中的均值）表示被n除，则st。偏差还必须基于n（而不是n-1）。2）让所有三个变量为同一变量。在这种情况下，我们期望相关性为1（在双变量情况下），但是a ...

— ttnphns 2013年

对于三变量正态分布，无论相关性如何，其均为零。

— 雷·库普曼

1

我真的认为标题会因更改为“ 3个变量的皮尔逊相关分析”或类似内容而受益-这将使此处的链接内容更为丰富

— -Silverfish

1

@银鱼我同意！我已经更新了标题，谢谢。

— PascalVKooten '18

11

这是确实的东西。为了找出答案，我们需要检查一下我们对关联本身的了解。

一个矢量值随机变量的相关矩阵 $\mathbf{X}=(X_1,X_2,\ldots,X_p)$ 是方差-协方差矩阵，或简称为“方差”的标准化版本 $\mathbf{X}$ 。也就是说，每个 $X_i$ 都会替换为最近更新的缩放版本。
$X_i$ 和的协方差 $X_j$ 是它们中心版本的乘积的期望。也就是说，写 $X^\prime_i = X_i - E[X_i]$ 和 $X^\prime_j = X_j - E[X_j]$ ，我们有

$Cov (X_{i}, X_{j}) = E [X_{i}^{'} X_{j}^{'}] .$ $\operatorname{Cov}(X_i,X_j) = E[X^\prime_i X^\prime_j].$
我将写为的的方差不是单个数字。它是值的数组 $\mathbf{X}$ $\operatorname{Var}(\mathbf{X})$
$Var (X)_{i j} = Cov (X_{i}, X_{j}) .$ $\operatorname{Var}(\mathbf{X})_{ij}=\operatorname{Cov}(X_i,X_j).$
考虑预期泛化的协方差的方法是将其视为张量。这意味着它是一个完整的量集合，由和索引，范围从到，当进行线性变换时，其值以特别简单的可预测方式改变。具体来说，令是由定义的另一个向量值随机变量 $v_{ij}$ $i$ $j$ $1$ $p$ $\mathbf{X}$ $\mathbf{Y}=(Y_1,Y_2,\ldots,Y_q)$

$Y_{i} = \sum_{j = 1}^{p} a_{i}^{j} X_{j} .$ $Y_i = \sum_{j=1}^p a_i^{\,j}X_j.$
常数（和是索引-不是功率）形成阵列 $a_i^{\,j}$ $i$ $j$ $j$ $q\times p$ ，和。期望的线性意味着 $\mathbb{A} = (a_i^{\,j})$ $j=1,\ldots, p$ $i=1,\ldots, q$

$Var (Y)_{i j} = \sum a_{i}^{k} a_{j}^{l} Var (X)_{k l} .$ $\operatorname{Var}(\mathbf Y)_{ij} = \sum a_i^{\,k}a_j^{\,l}\operatorname{Var}(\mathbf X)_{kl} .$
用矩阵表示法

$Var (Y) = A Var (X) A^{'} .$ $\operatorname{Var}(\mathbf Y) = \mathbb{A}\operatorname{Var}(\mathbf X) \mathbb{A}^\prime .$
由于极化身份，所有分量实际上都是单变量方差 $\operatorname{Var}(\mathbf{X})$

$4 Cov (X_{i}, X_{j}) = Var (X_{i} + X_{j}) - Var (X_{i} - X_{j}) .$ $4\operatorname{Cov}(X_i,X_j) = \operatorname{Var}(X_i+X_j) - \operatorname{Var}(X_i-X_j).$
这告诉我们，如果您了解单变量随机变量的方差，那么您已经了解了双变量变量的协方差：它们是方差的“正好”线性组合。

问题中的表达式完全类似：变量已按照进行了标准化。我们可以通过考虑它对任何变量（无论是否标准化）的含义来理解其含义。我们将每个替换为其居中版本，如，并形成具有三个索引的数量， $X_i$ $(1)$ $X_i$ $(2)$

μ_{3} (X)_{i j k} = E [X_{i}^{'} X_{j}^{'} X_{k}^{'}] .

$\mu_3(\mathbf{X})_{ijk} = E[X_i^\prime X_j^\prime X_k^\prime].$

These are the central (multivariate) moments of degree $3$ . As in $(4)$ , they form a tensor: when $\mathbf{Y} = \mathbb{A}\mathbf{X}$ , then

μ_{3} (Y)_{i j k} = \sum_{l, m, n} a_{i}^{l} a_{j}^{m} a_{k}^{n} μ_{3} (X)_{l m n} .

$\mu_3(\mathbf{Y})_{ijk} = \sum_{l,m,n} a_i^{\,l}a_j^{\,m}a_k^{\,n} \mu_3(\mathbf{X})_{lmn}.$

The indexes in this triple sum range over all combinations of integers from $1$ through $p$ .

The analog of the Polarization Identity is

\begin{aligned} 24 μ_{3} (X)_{i j k} = \\ μ_{3} (X_{i} + X_{j} + X_{k}) - μ_{3} (X_{i} - X_{j} + X_{k}) - μ_{3} (X_{i} + X_{j} - X_{k}) + μ_{3} (X_{i} - X_{j} - X_{k}) . \end{aligned}

$\eqalign{&24\mu_3(\mathbf{X})_{ijk} = \\ &\mu_3(X_i+X_j+X_k) - \mu_3(X_i-X_j+X_k) - \mu_3(X_i+X_j-X_k) + \mu_3(X_i-X_j-X_k).}$

$\mu_3$ $\mu_3(\mathbf{X})$ as being the multivariate skewness of $\mathbf{X}$ . It is a tensor of rank three (that is, with three indices) whose values are linear combinations of the skewnesses of various sums and differences of the $X_i$ . If we were to seek interpretations, then, we would think of these components as measuring in $p$ dimensions whatever the skewness is measuring in one dimension. In many cases,

The first moments measure the location of a distribution;
The second moments (the variance-covariance matrix) measure its spread;
The standardized second moments (the correlations) indicate how the spread varies in $p$ -dimensional space; and
The standardized third and fourth moments are taken to measure the shape of a distribution relative to its spread.

To elaborate on what a multidimensional "shape" might mean, observed that we can understand PCA as a mechanism to reduce any multivariate distribution to a standard version located at the origin and equal spreads in all directions. After PCA is performed, then, $\mu_3$ would provide the simplest indicators of the multidimensional shape of the distribution. These ideas apply equally well to data as to random variables, because data can always be analyzed in terms of their empirical distribution.

Reference

Alan Stuart & J. Keith Ord, Kendall's Advanced Theory of Statistics Fifth Edition, Volume 1: Distribution Theory; Chapter 3, Moments and Cumulants. Oxford University Press (1987).

Appendix: Proof of the Polarization Identity

Let $x_1,\ldots, x_n$ be algebraic variables. There are $2^n$ ways to add and subtract all $n$ of them. When we raise each of these sums-and-differences to the $n^\text{th}$ power, pick a suitable sign for each of those results, and add them up, we will get a multiple of $x_1x_2\cdots x_n$ .

More formally, let $S=\{1,-1\}^n$ be the set of all $n$ -tuples of $\pm 1$ , so that any element $s\in S$ is a vector $s=(s_1,s_2,\ldots,s_n)$ whose coefficients are all $\pm 1$ . The claim is

\begin{matrix} (1) & 2^{n} n! x_{1} x_{2} \dots x_{n} = \sum_{s \in S} s_{1} s_{2} \dots s_{n} (s_{1} x_{1} + s_{2} x_{2} + \dots + s_{n} x_{n})^{n} . \end{matrix}

$2^n n!\, x_1x_2\cdots x_n = \sum_{s\in S} \color{red}{s_1s_2\cdots s_n}(s_1x_1+s_2x_2+\cdots+s_nx_n)^n.\tag{1}$

Indeed, the Multinomial Theorem states that the coefficient of the monomial $x_1^{i_1}x_2^{i_2}\cdots x_n^{i_n}$ (where the $i_j$ are nonnegative integers summing to $n$ ) in the expansion of any term on the right hand side is

(\binom{n}{i_{1}, i_{2}, \dots, i_{n}}) s_{1}^{i_{1}} s_{2}^{i_{2}} \dots s_{n}^{i_{n}} .

$\binom{n}{i_1,i_2,\ldots,i_n}s_1^{i_1}s_2^{i_2}\cdots s_n^{i_n}.$

In the sum $(1)$ , the coefficients involving $x_1^{i_1}$ appear in pairs where one of each pair involves the case $s_1=1$ , with coefficient proportional to $\color{red}{s_1}$ times $s_1^{i_1}$ , equal to $1$ , and the other of each pair involves the case $s_1=-1$ , with coefficient proportional to $\color{red}{-1}$ times $(-1)^{i_1}$ , equal to $(-1)^{i_1+1}$ . They cancel in the sum whenever $i_1+1$ is odd. The same argument applies to $i_2, \ldots, i_n$ . Consequently, the only monomials that occur with nonzero coefficients must have odd powers of all the $x_i$ . The only such monomial is $x_1x_2\cdots x_n$ . It appears with coefficient $\binom{n}{1,1,\ldots,1}=n!$ in all $2^n$ terms of the sum. Consequently its coefficient is $2^nn!$ , QED.

We need take only half of each pair associated with $x_1$ : that is, we can restrict the right hand side of $(1)$ to the terms with $s_1=1$ and halve the coefficient on the left hand side to $2^{n-1}n!$ . That gives precisely the two versions of the Polarization Identity quoted in this answer for the cases $n=2$ and $n=3$ : $2^{2-1}2! = 4$ and $2^{3-1}3!=24$ .

Of course the Polarization Identity for algebraic variables immediately implies it for random variables: let each $x_i$ be a random variable $X_i$ . Take expectations of both sides. The result follows by linearity of expectation.

— whuber
source

Well done on explaining so far! Multivariate skewness kind of makes sense. Could you perhaps add an example that would show the importance of this multivariate skewness? Either as an issue in a statistical models, or perhaps more interesting, what real life case would be subject to multivariate skewness :)?

— PascalVKooten

3

Hmmm. If we run...

a <- rnorm(100);
b <- rnorm(100);
c <- rnorm(100)
mean((a-mean(a))*(b-mean(b))*(c-mean(c)))/
  (sd(a) * sd(b) * sd(c))

it does seem to center on 0 (I haven't done a real simulation), but as @ttnphns alludes, running this (all variables the same)

a <- rnorm(100)
mean((a-mean(a))*(a-mean(a))*(a-mean(a)))/
  (sd(a) * sd(a) * sd(a))

also seems to center on 0, which certainly makes me wonder what use this could be.

— Peter Flom - Reinstate Monica
source

2

The nonsense apparently comes from the fact that sd or variance is a function of squaring, as is covariance. But with 3 variables, cubing occurs in the numerator while denominator remains based on originally squared terms

— ttnphns

2

Is that the root of it (pun intended)? Numerator and denominator have the same dimensions and units, which cancel, so that alone doesn't make the measure poorly formed.

— Nick Cox

3

@Nick That's right. This is simply one of the multivariate central third moments. It is one component of a rank-three tensor giving the full set of third moments (which is closely related to the order-3 component of the multivariate cumulant generating function). In conjunction with the other components it could be of some use in describing asymmetries (higher-dimensional "skewness") in the distribution. It's not what anyone would call a "correlation," though: almost by definition, a correlation is a second-order property of the standardized variable.

— whuber

1

If You need to calculate "correlation" between three or more variables, you could not use Pearson, as in this case it will be different for different order of variables have a look here. If you are interesting in linear dependency, or how well they are fitted by 3D line, you may use PCA, obtain explained variance for first PC, permute your data and find probability, that this value may be to to random reasons. I've discuss something similar here (see Technical details below).

Matlab code

% Simulate our experimental data
x=normrnd(0,1,100,1);
y=2*x.*normrnd(1,0.1,100,1);
z=(-3*x+1.5*y).*normrnd(1,2,100,1);
% perform pca
[loadings, scores,variance]=pca([x,y,z]);
% Observed Explained Variance for first principal component
OEV1=variance(1)/sum(variance)
% perform permutations
permOEV1=[];
for iPermutation=1:1000
    permX=datasample(x,numel(x),'replace',false);
    permY=datasample(y,numel(y),'replace',false);
    permZ=datasample(z,numel(z),'replace',false);
    [loadings, scores,variance]=pca([permX,permY,permZ]);
    permOEV1(end+1)=variance(1)/sum(variance);
end

% Calculate p-value
p_value=sum(permOEV1>=OEV1)/(numel(permOEV1)+1)

— zlon
source