我对六个变量,,,,和进行了主成分分析。如果我理解正确,未旋转的PC1会告诉我这些变量的线性组合描述/解释了数据中的最大方差,而PC2告诉我这些变量的线性组合描述了数据中的第二大方差,依此类推。
我只是很好奇-有什么办法可以做到这一点吗?假设我选择了这些变量的线性组合-例如,我能算出所描述数据的方差是多少?
我对六个变量,,,,和进行了主成分分析。如果我理解正确,未旋转的PC1会告诉我这些变量的线性组合描述/解释了数据中的最大方差,而PC2告诉我这些变量的线性组合描述了数据中的第二大方差,依此类推。
我只是很好奇-有什么办法可以做到这一点吗?假设我选择了这些变量的线性组合-例如,我能算出所描述数据的方差是多少?
Answers:
如果我们以所有变量都居中为前提(PCA中的标准做法),那么数据中的总方差就是平方和:
这等于变量的协方差矩阵的迹线,其等于协方差矩阵的特征值之和。这与PCA在“解释数据”方面所说的数量相同-即,您希望您的PC解释协方差矩阵的对角元素的最大比例。现在,如果我们将其作为一组预测值的目标函数,如下所示:
然后,第一主成分最小化所有秩1个拟合值之间。所以看来您要追随的适当数量是
然后,我们将分数乘以权重向量,得出排名1的预测。
Then we plug these estimates into calculate . You can also put this into matrix norm notation, which may suggest a different generalisation. If we set as the matrix of observed values of the variables ( in your case), and as a corresponding matrix of predictions. We can define the proportion of variance explained as:
Where is the Frobenius matrix norm. So you could "generalise" this to be some other kind of matrix norm, and you will get a difference measure of "variation explained", although it won't be "variance" per se unless it is sum of squares.
Let's say I choose some linear combination of these variables -- e.g. , could I work out how much variance in the data this describes?
This question can be understood in two different ways, leading to two different answers.
A linear combination corresponds to a vector, which in your example is . This vector, in turn, defines an axis in the 6D space of the original variables. What you are asking is, how much variance does projection on this axis "describe"? The answer is given via the notion of "reconstruction" of original data from this projection, and measuring the reconstruction error (see Wikipedia on Fraction of variance unexplained). Turns out, this reconstruction can be reasonably done in two different ways, yielding two different answers.
Let be the centered dataset ( rows correspond to samples, columns correspond to variables), let be its covariance matrix, and let be a unit vector from . The total variance of the dataset is the sum of all variances, i.e. the trace of the covariance matrix: . The question is: what proportion of does describe? The two answers given by @todddeluca and @probabilityislogic are both equivalent to the following: compute projection , compute its variance and divide by :
This might not be immediately obvious, because e.g. @probabilityislogic suggests to consider the reconstruction and then to compute
Okay. Now consider a following example: is a dataset with covariance matrix
The total variance is . The variance of the projection onto (shown in red dots) is equal to . So according to the above logic, the explained variance is equal to . And in some sense it is: red dots ("reconstruction") are far away from the corresponding blue dots, so a lot of the variance is "lost".
On the other hand, the two variables have correlation and so are almost identical; saying that one of them describes only of the total variance is weird, because each of them contains "almost all the information" about the second one. We can formalize it as follows: given projection , find a best possible reconstruction with not necessarily the same as , and then compute the reconstruction error and plug it into the expression for the proportion of explained variance:
It is a matter of straightforward algebra to use regression solution for to find that the whole expression simplifies to
Note that if (and only if) is one of the eigenvectors of , i.e. one of the principal axes, with eigenvalue (so that ), then both approaches to compute coincide and reduce to the familiar PCA expression
PS. See my answer here for an application of the derived formula to the special case of being one of the basis vectors: Variance of the data explained by a single variable.
Finding minimizing the reconstruction is a regression problem (with as univariate predictor and as multivariate response). Its solution is given by
Next, the formula can be simplified as
Plugging now the equation for , we obtain for the numerator:
The denominator is equal to resulting in the formula given above.
Let the total variance, , in a data set of vectors be the sum of squared errors (SSE) between the vectors in the data set and the mean vector of the data set,
Now let the predictor of , , be the projection of vector onto a unit vector .
Then the for a given is
I think that if you choose to minimize , then is the first principal component.
If instead you choose to be the normalized version of the vector , then is the variance in the data described by using as a predictor.