应用PCA之前的常用技术是从样本中减去平均值。如果不这样做,第一个特征向量将是均值。我不确定您是否完成了,但让我来谈谈。如果我们用MATLAB代码讲:这是
clear, clf
clc
%% Let us draw a line
scale = 1;
x = scale .* (1:0.25:5);
y = 1/2*x + 1;
%% and add some noise
y = y + rand(size(y));
%% plot and see
subplot(1,2,1), plot(x, y, '*k')
axis equal
%% Put the data in columns and see what SVD gives
A = [x;y];
[U, S, V] = svd(A);
hold on
plot([mean(x)-U(1,1)*S(1,1) mean(x)+U(1,1)*S(1,1)], ...
[mean(y)-U(2,1)*S(1,1) mean(y)+U(2,1)*S(1,1)], ...
':k');
plot([mean(x)-U(1,2)*S(2,2) mean(x)+U(1,2)*S(2,2)], ...
[mean(y)-U(2,2)*S(2,2) mean(y)+U(2,2)*S(2,2)], ...
'-.k');
title('The left singular vectors found directly')
%% Now, subtract the mean and see its effect
A(1,:) = A(1,:) - mean(A(1,:));
A(2,:) = A(2,:) - mean(A(2,:));
[U, S, V] = svd(A);
subplot(1,2,2)
plot(x, y, '*k')
axis equal
hold on
plot([mean(x)-U(1,1)*S(1,1) mean(x)+U(1,1)*S(1,1)], ...
[mean(y)-U(2,1)*S(1,1) mean(y)+U(2,1)*S(1,1)], ...
':k');
plot([mean(x)-U(1,2)*S(2,2) mean(x)+U(1,2)*S(2,2)], ...
[mean(y)-U(2,2)*S(2,2) mean(y)+U(2,2)*S(2,2)], ...
'-.k');
title('The left singular vectors found after subtracting mean')
从图中可以看出,如果您想更好地分析(协)方差,则应该从数据中减去平均值。然后,这些值将不在10-100和0.1-1之间,但它们的平均值将均为零。方差将作为特征值(或奇异值的平方)找到。在我们减去均值的情况下,发现的本征向量不受维数规模的影响,而在没有维数的情况下,发现的特征向量不受此影响。例如,我已经测试并观察了以下内容,这些内容表明减去均值可能对您的情况很重要。因此,问题可能不是由方差引起的,而是由平移差异引起的。
% scale = 0.5, without subtracting mean
U =
-0.5504 -0.8349
-0.8349 0.5504
% scale = 0.5, with subtracting mean
U =
-0.8311 -0.5561
-0.5561 0.8311
% scale = 1, without subtracting mean
U =
-0.7327 -0.6806
-0.6806 0.7327
% scale = 1, with subtracting mean
U =
-0.8464 -0.5325
-0.5325 0.8464
% scale = 100, without subtracting mean
U =
-0.8930 -0.4501
-0.4501 0.8930
% scale = 100, with subtracting mean
U =
-0.8943 -0.4474
-0.4474 0.8943