首页> 外文期刊>Automatic Control and Computer Sciences >Micro-Blog Topic Detection Method Based on BTM Topic Model and K-Means Clustering Algorithm
【24h】

Micro-Blog Topic Detection Method Based on BTM Topic Model and K-Means Clustering Algorithm

机译:基于BTM主题模型和K-Means聚类算法的微博主题检测方法

获取原文
获取原文并翻译 | 示例
       

摘要

The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts - micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.
机译:微博的发展,产生了大量的短文本,为人们提供了方便的交流方式。同时,从短文本中发现主题确实是一个棘手的问题。对于传统的主题模型到模型短文本,例如概率潜在语义分析(PLSA)和潜在狄利克雷分配(LDA),这很难。当处置短文本时,他们遭受了严重的数据稀疏性的困扰。此外,当数据集密集且主题文档之间的差异明显时,K-means聚类算法可以使主题具有判别力。本文采用BTM主题模型来处理短文本-微博客数据,以缓解稀疏性问题。同时,我们将K-means聚类算法集成到BTM(Biterm主题模型)中,以进一步发现主题。在新浪微博短文集上的实验结果表明,我们的方法可以有效地发现主题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号