首页> 外文会议>Conference on uncertainty in artificial intelligence >Integrating Document Clustering and Topic Modeling
【24h】

Integrating Document Clustering and Topic Modeling

机译:集成文档聚类和主题建模

获取原文
获取外文期刊封面目录资料

摘要

Document clustering and topic modeling are two closely related tasks which can mutually benefit each other. Topic modeling can project documents into a topic space which facilitates effective document clustering. Cluster labels discovered by document clustering can be incorporated into topic models to extract local topics specific to each cluster and global topics shared by all clusters. In this paper, we propose a multi-grain clustering topic model (MGCTM) which integrates document clustering and topic modeling into a unified framework and jointly performs the two tasks to achieve the overall best performance. Our model tightly couples two components: a mixture component used for discovering latent groups in document collection and a topic model component used for mining multi-grain topics including local topics specific to each cluster and global topics shared across clusters. We employ varia-tional inference to approximate the posterior of hidden variables and learn model parameters. Experiments on two datasets demonstrate the effectiveness of our model.
机译:文档聚类和主题建模是两个紧密相关的任务,可以互利互惠。主题建模可以将文档投影到主题空间中,从而促进有效的文档聚类。通过文档聚类发现的聚类标签可以合并到主题模型中,以提取特定于每个聚类的本地主题和所有聚类共享的全局主题。在本文中,我们提出了一种多颗粒聚类主题模型(MGCTM),该模型将文档聚类和主题建模集成到一个统一的框架中,并共同执行两项任务以实现总体最佳性能。我们的模型紧密地结合了两个组件:用于在文档集合中发现潜在组的混合组件和用于挖掘多粒度主题的主题模型组件,其中包括针对每个集群的本地主题和在集群之间共享的全局主题。我们采用变分推断来近似隐藏变量的后验并学习模型参数。在两个数据集上进行的实验证明了我们模型的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号