首页> 中文期刊>电子学报 >基于概率主题模型的文档聚类

基于概率主题模型的文档聚类

     

摘要

To effectively cluster corpus of ordinary documents and digital books, the clustering algorithms based on LDA model and TC_ LDA were proposed, respectively. The topic model named TC_ LDA,the extension of LDA,is proposed for digital books corpus for jointly topic modeling from both of Texts and Contents. Unlike traditional clustering methods, topic model based methods cluster documents in a group if they share one or more common topics. Empirical evaluation demonstrates that our approach based on topic analysis can substantially improve the clustering results as compared to related methods.%为了实现普通文本语料库和数字图书语料库的有效聚类,分别提出基于传统LDA(Latent Dirichlet Allocation)模型和TC_ LDA模型的聚类算法.TC_ LDA模型在LDA模型基础上进行扩展,通过对图书文档的目录和正文信息联合进行主题建模.和传统方法不同,基于主题模型的聚类算法能将具备同一主题的文档聚为一类.实验结果表明从主题分析角度出发实现的聚类算法优于传统的聚类算法.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号