【24h】

Snapshot ensembles of non-negative matrix factorization for stability of topic modeling

机译:主题建模稳定性非负矩阵分解的快照集合

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Recently many topic models such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) have made important progress towards generating high-level knowledge from a large corpus. However, these algorithms based on random initialization generate different results on the same corpus using the same parameters, denoted as instability problem. For solving this problem, ensembles of NMF are known to be much more stable and accurate than individual NMFs. However, training multiple NMFs for ensembling is computationally expensive. In this paper, we propose a novel scheme to obtain the seemingly contradictory goal of ensembling multiple NMFs without any additional training cost. We train a single NMF algorithm with the cyclical learning rate schedule, which can converge to several local minima along its optimization path. We save the results to the ensemble when the model converges, and then restart the optimization with a large learning rate that can help escape the current local minimum. Based on experiments performed on text corpora using a number of measures to assess, our method can reduce instability at no additional training cost, while simultaneously yields more accurate topic models than traditional single methods and ensemble methods.
机译:最近许多主题模型,如潜在的Dirichlet分配(LDA)和非负矩阵分解(NMF)对从大型语料库产生高级知识进行了重要进展。但是,基于随机初始化的这些算法在使用相同的参数上生成不同的结果,表示为不稳定问题。为了解决这个问题,已知NMF的集合比单个NMFS更稳定和准确。但是,培训用于合奏的多个NMF是计算昂贵的。在本文中,我们提出了一种新颖的计划,以获得在没有任何额外培训成本的情况下获得多个NMF的看似矛盾的目标。我们用周期学习率计划训练单个NMF算法,可以沿着其优化路径收敛到几个局部最小值。当模型会聚时,我们将结果保存到集合中,然后使用大的学习速率重新启动优化,这些速率可以帮助逃离当前本地最小值。基于使用许多评级进行评估的文本语料库进行的实验,我们的方法可以无需额外的培训成本减少不稳定,而同时产生比传统的单个方法和集合方法更准确的主题模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号