信息获取,相互信息及相关措施


33

Andrew More 信息获取定义为:

IG(Y|X)=H(Y)H(Y|X)

其中H(ÿ|X条件熵。但是,维基百科称上述数量互为信息

另一方面,维基百科将信息增益定义为两个随机变量之间的Kullback-Leibler散度(又名信息散度或相对熵):

dķ大号P||=HP-HP

其中被定义为交叉熵HP

这两个定义似乎彼此不一致。

我还看到其他作者在谈论另外两个相关概念,即微分熵和相对信息增益。

这些数量之间的确切定义或关系是什么?有没有一本涵盖所有内容的好教科书?

  • 信息获取
  • 相互信息
  • 交叉熵
  • 条件熵
  • 微分熵
  • 相对信息获取

2
为了进一步增加混乱,请注意,您用于交叉熵的符号也与用于联合熵的符号相同。为了避免混淆自己,我使用进行交叉熵运算,但这是出于我的利益,我从未在其他地方看到该表示法。Hx(P,Q)
Michael McGowan

Answers:


24

我认为将Kullback-Leibler差异称为“信息增益”是非标准的。

第一个定义是标准的。

编辑:但是,H(Y)H(Y|X)也可以称为互信息。

请注意,我认为您不会找到真正具有标准化,精确且一致的命名方案的科学学科。因此,您始终必须查看公式,因为它们通常会为您提供更好的主意。

教科书:请参阅“很好地介绍各种熵”

另外:Cosma Shalizi:《复杂系统科学的方法和技术:概述》,Thomas S. Deisboeck和J. Yasha Kresh(ed。)的第1章(pp。33--114),生物医学中的复杂系统科学, http:// arxiv.org/abs/nlin.AO/0307015

罗伯特·格雷(Robert M. Gray):熵和信息论 http://ee.stanford.edu/~gray/it.html

David MacKay:信息理论,推理和学习算法 http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

还有,“什么是“熵和信息增益”?”


谢谢@wolf。我倾向于接受这个答案。如果第一个定义是标准定义,那么您将如何定义共同信息?
Amelio Vazquez-Reina

2
抱歉。第一个量也常被称为互信息。这是命名不一致的情况。正如我所说,我认为概念和名称之间没有任何一致,明确,一对一的对应关系。例如,“相互信息”或“信息获取”是KL分歧的一种特例,因此,维基百科的文章也相距不远。IG(Y|X)=H(Y)H(Y|X)
wolf.rauch 2011年

4

p(X,Y)P(X)P(Y)

I(X;Y)=H(Y)H(YX)=yp(y)logp(y)+x,yp(x)p(yx)logp(yx)=x,yp(x,y)logp(yx)y(xp(x,y))logp(y)=x,yp(x,y)logp(yx)x,yp(x,y)logp(y)=x,yp(x,y)logp(yx)p(y)=x,yp(x,y)logp(yx)p(x)p(y)p(x)=x,yp(x,y)logp(x,y)p(y)p(x)=DKL(P(X,Y)∣∣P(X)P(Y))

Note: p(y)=xp(x,y)


1

Mutual information can be defined using Kullback-Liebler as

I(X;Y)=DKL(p(x,y)||p(x)p(y)).

1

Extracting mutual information from textual datasets as a feature to train machine learning model: ( the task was to predict age, gender and personality of bloggers)

enter image description here


1

Both definitions are correct, and consistent. I'm not sure what you find unclear as you point out multiple points that might need clarification.

Firstly: MIMutualInformation IGInformationGainIInformation are all different names for the same thing. In different contexts one of these names may be preferable, i will call it hereon Information.

The second point is the relation between the Kullback–Leibler divergence-DKL, and Information. The Kullback–Leibler divergence is simply a measure of dissimilarity between two distributions. The Information can be defined in these terms of distributions' dissimilarity (see Yters' response). So information is a special case of KLD, where KLD is applied to measure the difference between the actual joint distribution of two variables (which captures their dependence) and the hypothetical joint distribution of the same variables, were they to be independent. We call that quantity Information.

The third point to clarify is the inconsistent, though standard notation being used, namely that H(X,Y) is both the notation for Joint entropy and for Cross-entropy as well.

So, for example, in the definition of Information:

in both last lines, H(X,Y) is the joint entropy. This may seem inconsistent with the definition in the Information gain page however: DKL(P||Q)=H(P,Q)H(P) but you did not fail to quote the important clarification - H(P,Q) is being used there as the cross-entropy (as is the case too in the cross entropy page).

Joint-entropy and Cross-entropy are NOT the same.

Check out this and this where this ambiguous notation is addressed and a unique notation for cross-entropy is offered - Hq(p)

I would hope to see this notation accepted and the wiki-pages updated.


wonder why the equations are not displayed properly..
Shaohua Li
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.