【24h】

Incorporating Word Correlation Knowledge into Topic Modeling

机译:将词相关知识纳入主题建模

获取原文

摘要

This paper studies how to incorporate the external word correlation knowledge to improve the coherence of topic modeling. Existing topic models assume words are generated independently and lack the mechanism to utilize the rich similarity relationships among words to learn coherent topics. To solve this problem, we build a Markov Random Field (MRF) regularized Latent Dirichlet Allocation (LDA) model, which defines a MRF on the latent topic layer of LDA to encourage words labeled as similar to share the same topic label. Under our model, the topic assignment of each word is not independent, but rather affected by the topic labels of its correlated words. Similar words have better chance to be put into the same topic due to the regularization of MRF, hence the coherence of topics can be boosted. In addition, our model can accommodate the subtlety that whether two words are similar depends on which topic they appear in, which allows word with multiple senses to be put into different topics properly. We derive a vari-ational inference method to infer the posterior probabilities and learn model parameters and present techniques to deal with the hard-to-compute partition function in MRF. Experiments on two datasets demonstrate the effectiveness of our model.
机译:本文研究如何结合外部单词相关知识来提高主题建模的连贯性。现有的主题模型假设单词是独立生成的,并且缺乏利用单词之间丰富的相似关系来学习相关主题的机制。为了解决此问题,我们建立了一个马尔可夫随机场(MRF)正规化的潜在狄利克雷分配(LDA)模型,该模型在LDA的潜在主题层上定义了一个MRF,以鼓励标记为相似的单词共享同一主题标签。在我们的模型下,每个单词的主题分配不是独立的,而是受其相关单词的主题标签影响的。由于MRF的正规化,相似的词更有可能被放入同一主题,因此可以增强主题的连贯性。另外,我们的模型可以容纳两个词是否相似取决于它们出现在哪个主题的微妙之处,从而可以将具有多种含义的词正确地放入不同的主题中。我们推导了一种变分推理方法来推断后验概率并学习模型参数,并提出了处理MRF中难以计算的分区函数的技术。在两个数据集上进行的实验证明了我们模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号