高斯核的特征图


24

K(x,y)=exp(xy222σ2)=ϕ(x)Tϕ(y)
x,yRnϕ

我还想知道是否 其中中的。现在,我认为这并不相等,因为使用内核可以处理线性分类器无法工作的情况。我知道将x到一个无限的空间。因此,即使它仍然保持线性,无论它有多少个维度,svm仍然无法进行良好的分类。

iciϕ(xi)=ϕ(icixi)
ciRϕ

为什么这个内核意味着转换?还是在指代相关的特征空间?
Placidia 2015年

是的,特征空间是多少,ϕ()所以ϕT(x)ϕ(x)=exp(12σ2xx2)
user27886

Answers:


20

你可以得到的明确公式为通过的泰勒级数展开高斯核Ë X。为了标记简单起见,假设X [R 1ϕexxR1

ϕ(x)=ex2/2σ2[1,11!σ2x,12!σ4x2,13!σ6x3,]T

NTU的Chih-Jen Lin 在幻灯片中对此进行了更详细的讨论(特别是幻灯片11)。请注意,幻灯片被用作内核参数。γ=12σ2

OP中的方程式仅适用于线性核。


2
嗨,但是上面的方程式只适合一维。
维维安

所以,在这里,再生核希尔伯特空间的子空间,正确吗?2
The_Anomaly

是否有拉普拉斯内核的显式表示?
Felix Crazzolara

13

对于任何有效PSD内核,存在一个特征映射φ Xħ使得ķ X ÿ = φ X φ ÿ ħ。实际上,空间H和嵌入φ不必是唯一的,但是有一个重要的唯一对Hφ ),称为复制内核希尔伯特空间(RKHS)。k:X×XRφ:XHk(x,y)=φ(x),φ(y)HHφ(H,φ)

RKHS的讨论者:Steinwart,Hush和Scovel,《高斯RBF核的再现核希尔伯特空间的明确描述》,IEEE信息理论2006年交易(doi免费citeseer pdf)。

这有点复杂,但归结为:定义e nz = en:CC

en(z):=(2σ2)nn!zneσ2z2.

是一个遍及所有d个非负整数的元组的序列;如果d = 3,也许Ñ 0 = 0 0 0 Ñ 1 = 0 0 1 Ñ 2 = 0 1 1 n:N0N0ddd=3n(0)=(0,0,0)n(1)=(0,0,1)n(2)=(0,1,1), 等等。用n i j表示第i个元组的第个分量。jinij

φ x )的个分量为d j = 1 e n i jx j。因此,φR d中的向量映射为无穷维复向量。iφ(x)j=1denij(xj)φRd

要注意的是,我们还必须以一种特殊的方式为这些无限维复数定义范数。有关详细信息,请参见本文。


Steinwart等。还提供一个更简单(我的思维)嵌入到从平方可积函数,希尔伯特空间[R d[R Φ σX = 2 σ dL2(Rd)RdR

Φσ(x)=(2σ)d2πd4e2σ2x22.
Φσ(x)RdRdx14σ2I
Φ(x),Φ(y)L2=[Φ(x)](t)[Φ(y)](t)dt,
we're taking the product of Gaussian density functions, which is itself a certain constant times a Gaussian density functions. When you do that integral by t, then, the constant that falls out ends up being exactly k(x,y).

These are not the only embeddings that work.

Another is based on the Fourier transform, which the celebrated paper of Rahimi and Recht (Random Features for Large-Scale Kernel Machines, NIPS 2007) approximates to great effect.

You can also do it using Taylor series: effectively the infinite version of Cotter, Keshet, and Srebro, Explicit Approximations of the Gaussian Kernel, arXiv:1109.4603.


1
Douglas Zare gave a 1d version of the "more straightforward" embedding in an interesting thread here.
Dougal

Here you find a more 'intuitive' explanation that the Φ can map onto a spave of dimension equal to the size of the training sample, even for an infinite training sample: stats.stackexchange.com/questions/80398/…

6

It seems to me that your second equation will only be true if ϕ is a linear mapping (and hence K is a linear kernel). As the Gaussian kernel is non-linear, the equality will not hold (except perhaps in the limit as σ goes to zero).


thank you for your answer. When σ0, the dimension of the Gaussian kernel projects would increase. And by your inspiration, now I think it is not equal. Because, using kernel just handle the situation that linear classification does not work.
Vivian
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.