首页> 中文期刊>计算机科学与探索 >面向微博热点话题发现的改进BBTM模型研究

面向微博热点话题发现的改进BBTM模型研究

     

摘要

针对目前基于主题模型的微博短文本热点话题发现存在特征稀疏、高维度以及需要人工指定主题数目等问题,提出一种基于改进突发词对主题模型(bursty biterm topic model,BBTM)的热点话题发现方法(hot topic-hot biterm topic model,H-HBTM).首先,利用词的突发概率进行特征选择,过滤非突发词.其次,结合微博文本的突发特性和传播特性计算微博词对的热值突发概率,将热值突发概率作为BBTM的先验概率.最后,利用基于密度的方法自适应选择BBTM的最优话题数目,确定最优BBTM,实现热点话题发现.在真实微博数据集上的实验表明,H-HBTM可以在不需要预先设定主题数目的情况下,自动发现最优话题模型,并且H-HBTM发现的热点话题的质量高于基于BBTM、词对主题模型以及潜在狄立克雷分配的方法.%In order to overcome the problems of current hot topic discovery methods based on topic model, such as the sparsity of features, the high dimension, and the requirement for pre-specifying the number of topics, a hot topic discovery method based on an improved bursty biterm topic model (BBTM) which is called hot topic-hot biterm topic model (H-HBTM) is proposed. First, the word burst probability is used to select features and to filter the non-burst words. Second, the hot burst probability of micro-blog word pairs can be expressed by integrating the burst characteristic and the propagation characteristic of micro-blog texts. The hot burst probability is used as the prior probability of the BBTM model. Finally, a density based method is used to select the optimal number of topics for the BBTM model so that the optimal BBTM model is determined to detect hot topics. The experiments conducted on the real micro-blog datasets demonstrate that the H-HBTM can automatically find the optimal model without pre-specifying the number of topics, and the quality of the hot topics found is superior to the other methods, such as the BBTM, the biterm topic model and the latent Dirichlet allocation.

著录项

  • 来源
    《计算机科学与探索》|2019年第7期|1103-1114|共12页
  • 作者单位

    College of Mathematics and Computer Sciences, Fuzhou University, Fuzhou 350116, China2. Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou 350116, China3. Key Laboratory of Ministry of Education for Spatial Data Mining & Information Sharing, Fuzhou University, Fuzhou 350116, China;

    College of Mathematics and Computer Sciences, Fuzhou University, Fuzhou 350116, China2. Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou 350116, China3. Key Laboratory of Ministry of Education for Spatial Data Mining & Information Sharing, Fuzhou University, Fuzhou 350116, China;

    College of Mathematics and Computer Sciences, Fuzhou University, Fuzhou 350116, China2. Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou 350116, China3. Key Laboratory of Ministry of Education for Spatial Data Mining & Information Sharing, Fuzhou University, Fuzhou 350116, China;

  • 原文格式 PDF
  • 正文语种 chi
  • 中图分类 信息处理(信息加工);
  • 关键词

    热点话题发现; 微博; 突发词对主题模型(BBTM); 主题模型;

  • 入库时间 2023-07-25 21:27:43

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号