线性变换后,余弦相似度如何变化?


9

之间是否存在数学关系:

  • 余弦相似度 sim(A,B)两个向量的A和,和B
  • 和的余弦相似度,通过给定矩阵不均匀缩放。这里是一个给定的对角矩阵,对角线上的元素不相等。A B M Msim(MA,MB)ABMM

我试图查看计算结果,但无法达到简单/有趣的链接(表达式)。我想知道是否有一个。


例如,在非均匀缩放中不会保留角度,但是原始角度与非均匀缩放后的角度之间是什么关系?关于一组向量S1与另一组向量S2之间的联系,可以说什么-其中S2是通过非均匀缩放S1获得的?


@whuber,谢谢!是的,M是一个给定的矩阵(缩放矩阵-因此是对角矩阵,没有其他限制)。从某种意义上说,我想知道发生非线性缩放的向量空间(就任何一对向量的余弦相似性而言)会发生什么。
turdus-merula

2
可能值得注意的是,如果所有比例因子都是非负的(就像人们自然会假设的那样),那么所有对称的正定矩阵都可以视为“缩放”矩阵。您寻求的关系尤其广泛地用于研究和描述地图投影中的变形。在那里,兴趣点集中在地球表面上与地图上两个垂直方向相关联的最大最小角度。这些角度与两个比例因子的比率之间存在直接关系。
whuber

Answers:


8

因为非常笼统,并且余弦相似度的变化取决于特定的ABMAB及其与的关系,所以不可能有确定的公式。但是,余弦相似度可以改变多少实际上有可计算的限制。它们可以通过extremizing之间的角度可以找到中号中号鉴于之间的余弦相似性是指定值时,说COS 2 φ (其中2 φ是之间的角度MMAMBABcos(2ϕ)2ϕA)。答案告诉我们多少任意角度 2 φ都不可能通过变换弯中号B2ϕM

计算结果可能会混乱。某些巧妙的符号选择以及一些初步的简化可以减少工作量。事实证明,二维解决方案揭示了我们需要了解的所有内容。 这是一个棘手的问题,仅取决于一个实变量,可以使用微积分技术轻松解决。一个简单的几何参数将该解决方案扩展到任意数量的维度nθn

数学预备

根据定义,通过将两个向量B归一化为单位长度并取其乘积,可以得出任意两个向量AB之间的夹角。从而,AB

AB(AA)(BB)=cos(2ϕ)

并且,写,的图像之间的角度的余弦下变换中号Σ=MMABM

(1)(MA)(MB)((MA)(MA))((MB)(MB))=AΣB(AΣA)(BΣB).

请注意,分析中仅重要,ΣM不重要M本身。因此,我们可以利用的奇异值分解(SVD)简化问题。回想一下,这表示M为正交矩阵V ',对角矩阵D和另一个正交矩阵U的乘积(从右到左):MMVDU

M=UDV.

换句话说,存在特权向量V的列)的基础,M通过分别对每个e i进行缩放来对M起作用。e1,,enVMei D的i 对角线项(分别称为 d i)然后对结果应用旋转(或防旋转) U。最终旋转不会改变任何长度或角度,因此不会影响 Σ。您可以通过计算正式看到这一点ithDdiUΣ

Σ=MM=(UDV)(UDV)=VD(UU)DV=VD2V.

因此,为了研究我们可以自由地用在1 中产生相同值的任何其他矩阵替换MΣM(1)。通过对排序,使d i减小(并假设M不等于零),M的一个不错的选择是eidiMM

M=1d1DV.

的对角元素(1/d1)D

1=d1/d1λ2=d2/d1λ3=d3/d1λn=dn/d10.

具体而言,(无论是原始形式还是更改形式)对所有角度的影响完全取决于以下事实:M

Mei=λiei.

特殊情况分析

n=2。因为改变向量的长度不会改变它们之间的角度,所以我们可以假设B是单位向量。在平面上,所有这样的向量都可以由它们与e 1形成的角度来指定,这使我们可以写ABe1

A=cos(θϕ)e1+sin(θϕ)e2.

因此

B=cos(θ+ϕ)e1+sin(θ+ϕ)e2.

(请参见下图。)

施加是简单的:它固定的第一坐标通过并且乘以它们的第二坐标λ 2MABλ2。因此从M B的角度是MAMB

f(θ)=arctan(λ2tan(θ+ϕ))arctan(λ2tan(θϕ)).

因为是一个连续函数,所以角度差是θ的连续函数Mθ。实际上,这是有区别的。这使我们能够通过检查导数的零点来找到极限角。该导数易于计算:它是三角函数的比率。零只能出现在其分子的零之间,因此我们不必费心计算分母。我们获得f(θ)

f(θ)=λ2(1λ2)(λ2+1)sin(2θ)sin(2ϕ).

的特殊情况下,λ 2 = 1,并且φ = 0是容易理解:它们对应于其中的情况中号是降秩的(因此南瓜所有矢量到线); 其中M是单位矩阵的倍数;且其中AB平行(因此,无论θ为何,它们之间的角度都不能改变)。的情况下λ 2 = -λ2=0λ2=1ϕ=0MMABθ由条件排除 λ 20λ2=1λ20

除了这些特殊情况下,其中仅发生零点:即,θ = 0θ = π / 2。这意味着由e 1确定的线将角度A B等分。现在我们知道M AM B之间的夹角的极值必须位于f θ 的值之中,因此让我们计算它们:sin(2θ)=0θ=0θ=π/2e1ABMAMBf(θ)

f(0)=arctan(λ2tan(ϕ))arctan(λ2tan(ϕ))=2arctan(λ2tan(ϕ));f(π/2)=arctan(λ2tan(π/2+ϕ))arctan(λ2tan(π/2ϕ))=2arctan(λ2cot(ϕ)).

对应的余弦为

(2)cos(f(0))=1λ22tan(ϕ)21+λ22tan(ϕ)2

(3)cos(f(π/2))=1λ22cot(ϕ)21+λ22cot(ϕ)2=tan(ϕ)2λ22tan(ϕ)2+λ22.

通常,了解如何使直角失真就足够了。在这种情况下,2 ϕ = π / 2,导致tan ϕ = cot ϕ = 1,您可以将其插入前面的公式中。M2ϕ=π/2tan(ϕ)=cot(ϕ)=1

需要注意的是较小的而成,更极端的这些角度成为与越大的失真。λ2

Figure showing four configurations

该图显示了矢量B的四种配置,它们之间的夹角为2 ϕ = π / 3。单位圆和在其椭圆形的图像中号加阴影以供参考(使用的动作中号均匀地重新缩放,使λ 1 = 1)。图的标题指示θ的值,即AB的中点。当用M变换时,任何这样的AB都可以到达的最接近的配置类似于左侧的配置,其中θ =AB2ϕ=π/3MMλ1=1θABABM。它们之间可以相距最远的是类似于 θ = π / 2的右图。显示了两种中间可能性。θ=0θ=π/2

所有尺寸的解决方案

我们已经看到了的作用,扩大各维度的一个因素λ 。这会使单位球面{ AMiλi{A|AA=1} into an ellipsoid. The ei determine its principal axes. The λi are the distances from the origin, along these axes, to the ellipsoid. Consequently the smallest one, λn, is the shortest distance (in any direction) from the origin to the ellipsoid and the largest one, λ1, is the furthest distance (in any direction) from the origin to the ellipsoid.

In higher dimensions n>2, A and B are part of a two-dimensional subspace. M maps the unit circle in this subspace into the intersection of the ellipsoid with a plane containing MA and MB. This intersection, being a linear distortion of a circle, is an ellipse. Obviously the furthest distance to this ellipse is no more than λ1=1 and the shortest distance is no less than λn.

As we observed at the end of the preceding section, the most extreme possibility is when A and B are situated in a plane containing two of the ei for which the ratio of the corresponding λi is as small as possible. This will happen in the e1,en plane. We already have the solution for that case.

Conclusions

The extremes of cosine similarity attainable by applying M to two vectors having cosine similarity cos(2ϕ) are given by (2) and (3). They are attained by situating A and B at equal angles to a direction in which Σ=MM maximally lengthens any vector (such as the e1 direction) and separating them in a direction in which Σ minimally lengthens any vector (such as the en direction).

These extremes can be computed in terms of the SVD of M.


This is a fantastic answer! Thank you very much for this detailed discussion! I believe that you have a sign mistake in eqn (3) where you should just have an overall minus sign.
LFH

I'm interested in the case where the angle 2ϕ approaches zero and I would like to get an inequality between 2ϕ and f. Is it true that based on your computation, I just need to find the most extreme (that is smallest) λn and in this case, the asymptotic inequality is given by 2λnϕf2λn1ϕ as ϕ0?
LFH

6

You are probably interested in:

(MA,MB)=AT(MTM)B,

You can diagonalize MTM=UΣUT (or as you folks call it, PCA), which tells you that the similarity of A,B under transformation M behaves by projecting A,B onto your principal components, and subsequently calculating similarity in this new space. To flesh this out a bit more, let the principal components be ui with eigenvalues λi. Then

UB=i(ui,bi)ui, UA=i(ui,ai)ui,

which gives you:

(MA,MB)=i=1n(ui,ai)(ui,bi)λi.

Notice that there is a scaling going on here: the λi are stretching/shrinking. When A,B are unit vectors and if every λi=1, then M corresponds to a rotation, and you get: sim(MA,MB)=sim(A,B), which is equivalent to saying that inner products are invariant under rotations. In general, the angle stays the same when M is a conformal transformation, which in this case requires that M is invertible and the polar decomposition of M satisfies M=OP with P=aI, i.e. MTM=a2I.


1
Your initial statement of the problem neglects the normalization of the vectors A, B, MA, and MB required to compute the cosine similarity. It does not appear that the subsequent analysis addresses this normalization, either. Note, in particular, that the cosine similarities are preserved even when all the eigenvalues are equal to some (positive) value that differs from 1. That demonstrates, even in this simple case, that much more can be said.
whuber

@whuber: cosine similarity is preserved exactly when M is a conformal transformation, which in this case is equivalent to requiring M to be invertible and MTM=a2I, a multiple of the identity. Said another way, the polar decomposition of M satisfies M=OP, where P=aI. You're right about normalization but, it seems silly to talk about cosine similarity with non-normalized vectors A,B.
Alex R.

2
Not silly at all! Since this "similarity" is given by the cosine of the angle between the vectors, it makes sense for any two non-zero vectors. What I meant by "much more can be said" is that effective bounds on the angle between the images of A and B can be obtained in terms of the angle between A and B and the eigenvalues of M.
whuber
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.