首页> 外文会议>Annual conference on Neural Information Processing Systems >Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm
【24h】

Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

机译:无锚相关主题建模:可识别性和算法

获取原文

摘要

In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words - i.e., words that only appear (with positive probability) in one topic. Follow-up work has resorted to three or higher-order statistics of the data corpus to relax the anchor word assumption. Reliable estimates of higher-order statistics are hard to obtain, however, and the identification of topics under those models hinges on uncorrelatedness of the topics, which can be unrealistic. This paper revisits topic modeling based on second-order moments, and proposes an anchor-free topic mining framework. The proposed approach guarantees the identification of the topics under a much milder condition compared to the anchor-word assumption, thereby exhibiting much better robustness in practice. The associated algorithm only involves one eigen-decomposition and a few small linear programs. This makes it easy to implement and scale up to very large problem instances. Experiments using the TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free approach exhibits very favorable performance (measured using coherence, similarity count, and clustering accuracy metrics) compared to the prior art.
机译:在主题建模中,已经在存在锚词的前提下开发了许多保证主题可识别性的算法,即,仅在一个主题中出现(具有正概率)的词。后续工作采用了数据语料库的三个或更高级的统计信息来放宽锚定词的假设。但是,很难获得可靠的高阶统计量估计值,并且在这些模型下确定主题取决于主题的不相关性,这可能是不现实的。本文回顾了基于二阶矩的主题建模,并提出了一种无锚主题挖掘框架。与锚定词假设相比,所提出的方法可确保在较温和的条件下识别主题,从而在实践中展现出更好的鲁棒性。相关的算法仅涉及一个特征分解和一些小的线性程序。这使得实现和扩展到非常大的问题实例变得容易。使用TDT2和Reuters-21578语料库进行的实验表明,与现有技术相比,所提出的无锚方法表现出非常良好的性能(使用相干性,相似性计数和聚类精度度量来测量)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号