【24h】

Short Text Topic Discovery Based on BTM Topic Model

机译:基于BTM主题模型的短文本主题发现

获取原文

摘要

With the further development of the online social platform, the research techniques of hot topic related to short text data which are represented by Weibo, instant messaging, news commentary and so on, are not extensive enough, and the research efforts are not deep enough either. Moreover, short text data set has many characteristics such as high noise, sparsity, and irregular specification, which makes the performance of traditional topic research techniques insufficient. Therefore, for the data characteristics of short text, this paper uses a short text topic discovery method based on BTM (Bi-term Topic Model) theme model. Firstly, the BTM of the processed short book is modeled to meet the probability distribution of the subject obtained after the data language features of the essay are modeled. Then JS distance is used as the text similarity measure, combined with the improved Single-pass clustering algorithm to find out the hot topic of short text data set. The comparison experiments show that the short text modeling and improved single-pass algorithm use BTM making the clustering efficiency improved, and it can effectively solve the problem of data sparsity in short texts. There has been a remarkable improvement in the quality of the topic discovery.
机译:随着在线社交平台的进一步发展,与微博,即时消息,新闻注释等的短文本数据相关的热门话题的研究技巧并不足够广泛,并且研究努力也不够深。此外,短文本数据集具有许多特性,例如高噪声,稀疏性和不规则规范,这使得传统主题研究技术的性能不足。因此,对于短文本的数据特征,本文使用基于BTM(双术语主题模型)主题模型的短文本主题发现方法。首先,被修改的短图的BTM被建模以满足所获得的文章的数据语言特征后获得的概率分布。然后JS距离用作文本相似度测量,结合改进的单通群集算法来查找短文本数据集的热门话题。比较实验表明,短文本建模和改进的单通算法使用BTM使聚类效率得到改善,可以有效地解决短篇文本中数据稀疏问题的问题。主题发现的质量有了显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号