Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

机译：无锚相关主题建模：可识别性和算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words - i.e., words that only appear (with positive probability) in one topic. Follow-up work has resorted to three or higher-order statistics of the data corpus to relax the anchor word assumption. Reliable estimates of higher-order statistics are hard to obtain, however, and the identification of topics under those models hinges on uncorrelatedness of the topics, which can be unrealistic. This paper revisits topic modeling based on second-order moments, and proposes an anchor-free topic mining framework. The proposed approach guarantees the identification of the topics under a much milder condition compared to the anchor-word assumption, thereby exhibiting much better robustness in practice. The associated algorithm only involves one eigen-decomposition and a few small linear programs. This makes it easy to implement and scale up to very large problem instances. Experiments using the TDT2 and Reuters-21578 corpus demonstrate that the proposed anchor-free approach exhibits very favorable performance (measured using coherence, similarity count, and clustering accuracy metrics) compared to the prior art.

机译：在主题建模中，已经在存在锚词的前提下开发了许多保证主题可识别性的算法，即，仅在一个主题中出现（具有正概率）的词。后续工作采用了数据语料库的三个或更高级的统计信息来放宽锚定词的假设。但是，很难获得可靠的高阶统计量估计值，并且在这些模型下确定主题取决于主题的不相关性，这可能是不现实的。本文回顾了基于二阶矩的主题建模，并提出了一种无锚主题挖掘框架。与锚定词假设相比，所提出的方法可确保在较温和的条件下识别主题，从而在实践中展现出更好的鲁棒性。相关的算法仅涉及一个特征分解和一些小的线性程序。这使得实现和扩展到非常大的问题实例变得容易。使用TDT2和Reuters-21578语料库进行的实验表明，与现有技术相比，所提出的无锚方法表现出非常良好的性能（使用相干性，相似性计数和聚类精度度量来测量）。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2016年|1794-1802|共9页
会议地点
作者
Kejun Huang; Xiao Fu; Nicholas D. Sidiropoulos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Anchor-Free Correlated Topic Modeling [J] . Fu Xiao, Huang Kejun, Sidiropoulos Nicholas D., IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第5期

机译：无锚相关主题建模
2. Anchor-Free Correlated Topic Modeling [J] . Fu Xiao, Huang Kejun, Sidiropoulos Nicholas D., IEEE Transactions on Pattern Analysis and Machine Intelligence . 2019,第5期

机译：无锚相关主题建模
3. When Are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity [J] . Animashree Anandkumar, Daniel Hsu, Majid Janzamin, Journal of machine learning research . 2015,第Apr期

机译：什么时候可以识别出不完整的主题模型？具有结构稀疏性的Tensor Tucker分解的唯一性
4. Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm [C] . Kejun Huang, Xiao Fu, Nicholas D. Sidiropoulos Annual conference on Neural Information Processing Systems . 2016

机译：无锚相关主题建模：可识别性和算法
5. Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models [D] . Yu, Xingchen. 2015

机译：通过可扩展的Probit正常相关主题模型进行主题发现和图像注释
6. A study on the application of topic models to motif finding algorithms [O] . Josep Basha Gutierrez, Kenta Nakai 2016

机译：主题模型在主题发现算法中的应用研究
7. Practical Correlated Topic Modeling and Analysis via the Rectified Anchor Word Algorithm [O] . Moontae Lee, Sungjun Cho, David Bindel, 2019

机译：通过整流锚字算法的实用相关主题建模与分析
8. When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity. [R] . Anandkumar, A., Hsu, D., Janzamin, M., 2013

机译：何时超完整主题模型可识别？具有结构稀疏性的Tensor Tucker分解的唯一性。

Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅