如何在非负矩阵分解中选择最佳潜在因子数量?


16

给定的矩阵Vm×n非负矩阵分解(NMF)发现两个非负矩阵Wm×kHk×n(即与所有元素0)来表示分解矩阵为:

VWH,

例如要求非负的WH

VWH2.

是否有通用的方法来估算k NMF中?例如,如何将交叉验证用于此目的?


我没有任何引用(实际上我在Google Scholar上进行了快速搜索,但没有找到任何引用),但是我认为交叉验证应该是可能的。
变形虫说恢复莫妮卡2014年

2
您能否告诉我有关如何对NMF执行交叉验证的更多详细信息?F数的K值将始终随着K数的增加而减小。
史蒂夫·塞勒

您在做什么NMF?是在较低维度的空间中表示(无监督)还是提供建议(有监督)。您的V多大?您需要解释方差的一定百分比吗?您可以在定义目标指标后应用简历。我鼓励您考虑该应用程序并找到一个有意义的指标。VV
无知

Answers:


10

要在非负矩阵分解中选择最佳数量的潜在因子,请使用交叉验证。

如你写,NMF的目的是要找到低维WH以最小化重构误差的所有非负元素VWH2。想象一下,我们遗漏了V一个元素,例如Vab,并对缺少一个单元的结果矩阵执行NMF运算。这意味着找到WH可使所有非缺失像元的重构误差最小:

ijab(Vij[WH]ij)2.

一旦做到这一点,我们可以预测左外元件Vab,通过计算[WH]ab,并计算预测误差

eab=(Vab[WH]ab)2.
可以重复此过程,一次只删除所有元素Vab,并对所有ab的预测误差求和。这将导致整体PRESS值(预测的残差平方和)E(k)=abeab,这将取决于k。希望函数E(k)可以用作“最佳”k的最小值。

请注意,这可能会在计算上造成很高的成本,因为必须为每个遗漏的值重复NMF,并且编程也可能很棘手(取决于使用缺失值执行NMF的难易程度)。在PCA中,可以通过省略完整的V行来解决此问题(这会大大加快计算速度),请参阅如何对PCA执行交叉验证以确定主分量的数量中的答复,但这在这里是不可能的。

当然,所有交叉验证的通常原理都适用于此,因此一个人一次可以省去很多单元(而不是一个),并且/或者只对一些随机单元重复该过程,而不是遍历所有单元。两种方法都可以帮助加快流程。

编辑(2019年3月):参见@AlexWilliams的这张非常精美的插图文章http : //alexhwilliams.info/itsneuronalblog/2018/02/26/crossval。Alex 对于缺少值的NMF 使用https://github.com/kimjingu/nonnegfac-python


4

据我所知,有两个良好的标准:1)显着相关系数; 2)比较残差平方和与一组等级的随机数据的比较(也许有一个名字,但我不记得了)

  1. Cophenetic相关系数: 您对每个等级重复NMF几次,并计算结果的相似程度。换句话说,考虑到初始种子是随机的,识别出的簇的稳定性如何。在共模系数下降之前,选择最高的K。

  2. 针对随机数据的RSS 对于任何降维方法,与原始数据相比(RSS估计)总是会丢失信息。现在执行NMF以增加K,并使用原始数据集和随机数据集计算RSS。当比较RSS与K的函数时,RSS在原始数据集中随着K的增加而减小,但是对于随机数据集来说情况则更少。通过比较两个斜率,交叉处应该有一个K。换句话说,在噪声之内,您能损失多少信息(=最高K)。

希望我足够清楚。

编辑:我找到了那些文章。

1,吉恩 Brunet,Pablo Tamayo,Todd R. Golub和Jill P. Mesirov。使用矩阵分解进行元基因和分子模式发现。美国国家科学院学报,101(12):4164-4169,2004。

2,阿蒂拉·弗里吉西(Attila Frigyesi)和马蒂亚斯·霍格隆(Mattias Hoglund)。用于分析复杂基因表达数据的非负矩阵分解:临床相关肿瘤亚型的鉴定。Cancer Informatics,6:275-292,2008。


目前尚不清楚,当K较小时,为什么随机数据的RSS应当低于原始数据计算的RSS?对于其余的内容,我知道随机的RSS的下降速度应比原始数据的下降速度更慢。
马利克·科内(MalikKoné)

1

In the NMF factorization, the parameter k (noted r in most literature) is the rank of the approximation of V and is chosen such that k<min(m,n). The choice of the parameter determines the representation of your data V in an over-complete basis composed of the columns of W; the wi , i=1,2,,k . The results is that the ranks of matrices W and H have an upper bound of k and the product WH is a low rank approximation of V; also k at most. Hence the choice of k<min(m,n) should constitute a dimensionality reduction where V can be generated/spanned from the aforementioned basis vectors.

Further details can be found in chapter 6 of this book by S. Theodoridis and K. Koutroumbas.

WHkVVV

kWk is tantamount to working with different dimensionality-reduced feature spaces.


4
But the question was about how to choose the optimal k! Can you provide any insights about that?
amoeba says Reinstate Monica

@amoeba Unless I misread the initial question, it is "Are there common practices to estimate the number k in NMF?". The optimal k is chosen empirically. I have expanded my answer.
Gilles

2
Your explanation of the NMF factorization makes total sense, but the initial question was specifically about the common practices to estimate k. Now you wrote that one can chose k "empirically" (okay) "by working with different feature sub-spaces". I am not sure I understand what "working with different feature sub-spaces" means, could you expand on that? How should one work with them?? What is the recipe to chose k? This is what the question is about (at least as I understood it). Will be happy to revert my downvote!
amoeba says Reinstate Monica

2
I appreciate your edits, and am very sorry for being so dumb. But let's say I have my data, and I [empirically] try various values of k between 1 and 50. How am I supposed to choose the one which worked the best??? This is how I understand the original question, and I cannot find anything in your reply about that. Please let me know if I missed it, or if you think that the original question was different.
amoeba says Reinstate Monica

1
@amoeba That will depend on your application, data, and what you want to accomplish. Is it just the dimensionality reduction, or source separation, etc ? In audio applications for instance, say source separation, the optimal k would be the one that gives you the best quality when listening to the separated audio sources. The motivation for the choice here will of course be different if you were working with images for instance.
Gilles
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.