首页> 外文期刊>MATEC Web of Conferences >Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model
【24h】

Microblog Hot Spot Mining Based on PAM Probabilistic Topic Model

机译:基于PAM概率主题模型的微博热点挖掘

获取原文
           

摘要

Microblogs are short texts carried with limited information, which will increase the difficulty of topic mining. This paper proposes the use of PAM (Pachinko Allocation Model) probabilistic topic model to extract the generative model of text’s implicit theme for microblog hot spot mining. First, three categories of microblog and the main contribution of this paper are illustrated. Second, for there are four topic models which are respectively explained, the PAM model is introduced in detail in terms of how to generate a document, the accuracy of document classification and the topic correlation in PAM. Finally, MapReduce is described. For the number of microblogs is huge as well as the number of contactors, the totally number of words is relatively small. With MapReduce, microblogs data are split by contactor, document-topic count matrix and contactor-topic count matrix can be locally stored while the word-topic count matrix must be globally stored. Thus, the hot spot mining can be achieved on the basis of PAM probabilistic topic model.Key words: microblog / hot spot / PAM probabilistic topic model / MapReduce
机译:微博是携带有限信息的短文本,这将增加主题挖掘的难度。本文提出了使用PAM(Pachinko Allocation Model)概率主题模型来提取文本隐式主题的生成模型,以用于微博热点挖掘。首先,阐述了三类微博以及本文的主要贡献。其次,针对分别说明的四个主题模型,从如何生成文档,文档分类的准确性以及PAM中的主题相关性方面详细介绍了PAM模型。最后,介绍了MapReduce。由于微博的数量以及接触者的数量都是巨大的,因此单词的总数相对较小。使用MapReduce,微博客数据按接触者进行拆分,文档主题计数矩阵和接触者主题计数矩阵可以本地存储,而词主题计数矩阵必须全局存储。因此,可以基于PAM概率主题模型实现热点挖掘。关键词:微博/热点/ PAM概率主题模型/ MapReduce

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号