Incorporating Word Correlation Knowledge into Topic Modeling

机译：将词相关知识纳入主题建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper studies how to incorporate the external word correlation knowledge to improve the coherence of topic modeling. Existing topic models assume words are generated independently and lack the mechanism to utilize the rich similarity relationships among words to learn coherent topics. To solve this problem, we build a Markov Random Field (MRF) regularized Latent Dirichlet Allocation (LDA) model, which defines a MRF on the latent topic layer of LDA to encourage words labeled as similar to share the same topic label. Under our model, the topic assignment of each word is not independent, but rather affected by the topic labels of its correlated words. Similar words have better chance to be put into the same topic due to the regularization of MRF, hence the coherence of topics can be boosted. In addition, our model can accommodate the subtlety that whether two words are similar depends on which topic they appear in, which allows word with multiple senses to be put into different topics properly. We derive a vari-ational inference method to infer the posterior probabilities and learn model parameters and present techniques to deal with the hard-to-compute partition function in MRF. Experiments on two datasets demonstrate the effectiveness of our model.

机译：本文研究如何结合外部单词相关知识来提高主题建模的连贯性。现有的主题模型假设单词是独立生成的，并且缺乏利用单词之间丰富的相似关系来学习相关主题的机制。为了解决此问题，我们建立了一个马尔可夫随机场（MRF）正规化的潜在狄利克雷分配（LDA）模型，该模型在LDA的潜在主题层上定义了一个MRF，以鼓励标记为相似的单词共享同一主题标签。在我们的模型下，每个单词的主题分配不是独立的，而是受其相关单词的主题标签影响的。由于MRF的正规化，相似的词更有可能被放入同一主题，因此可以增强主题的连贯性。另外，我们的模型可以容纳两个词是否相似取决于它们出现在哪个主题的微妙之处，从而可以将具有多种含义的词正确地放入不同的主题中。我们推导了一种变分推理方法来推断后验概率并学习模型参数，并提出了处理MRF中难以计算的分区函数的技术。在两个数据集上进行的实验证明了我们模型的有效性。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 》|2015年|725-734|共10页
会议地点
作者
Pengtao Xie; Diyi Yang; Eric P. Xing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A novel topic model for documents by incorporating semantic relations between words [J] . Soft computing: A fusion of foundations, methodologies and applications . 2020 ,第15期

机译：通过结合单词之间的语义关系的文档的新颖主题模型
2. Incorporating word embeddings into topic modeling of short text [J] . Gao Wang, Peng Min, Wang Hua, Knowledge and information systems . 2019 ,第2期

机译：将Word Embeddings纳入了短文本的主题建模
3. Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification [J] . Xiaolin Zheng, Zhen Lin, Xiaowei Wang, Knowledge-Based Systems . 2014 ,第may期

机译：将评估表达模式整合到主题建模中，以进行方面和情感词识别
4. Incorporating Word Correlation Knowledge into Topic Modeling [C] . Pengtao Xie, Diyi Yang, Eric P. Xing Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2015

机译：将单词相关知识结合到主题建模中
5. Incorporating domain knowledge in latent topic models [D] . Andrzejewski, David Michael 2010

机译：将领域知识整合到潜在主题模型中
6. Incorporating Domain Knowledge into Topic Modeling via Dirichlet Forest Priors [O] . David Andrzejewski, Xiaojin Zhu, Mark Craven -1

机译：通过Dirichlet Forest Priors将领域知识纳入主题建模
7. Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic Models [O] . Ramesh Nallapati, Amr Ahmed, William Cohen, 2008

机译：稀疏字图：主题模型中捕获字相关性的可扩展算法

Incorporating Word Correlation Knowledge into Topic Modeling

摘要

著录项

相似文献

相关主题

期刊订阅