首页> 外文期刊>Physica, A. Statistical mechanics and its applications >A time-series based aggregation scheme for topic detection in Weibo short texts
【24h】

A time-series based aggregation scheme for topic detection in Weibo short texts

机译:基于时间级的基于时间系列的微博短文本主题检测聚合方案

获取原文
获取原文并翻译 | 示例
       

摘要

Discovering hot topics within social network like Twitter and Weibo, has received much attention in recent years. While topic models such as Latent Dirichlet Allocation (LDA) have been successfully applied in topic discovery, they are often less coherent when applied to microblog content which is known as "posts". In this paper, we propose a time-series based aggregation scheme for topic modeling in Weibo. As Weibo topics are coherent within a time slice, we divide Weibo dataset into groups by time slice. With this scheme, posts in every group are aggregated into several longer pseudo-documents using paragraph-vector based similarity algorithms. While applying this scheme to LDA model, we dramatically decrease the topic model perplexity and increase the clustering quality, which also allows for better discovery of underlying topics in Weibo. Furthermore, we can let other topic models extended on LDA be directly used on such short texts. (C) 2019 Elsevier B.V. All rights reserved.
机译:在像Twitter和Weibo这样的社交网络中发现热门话题,近年来受到了很多关注。 虽然主题模型(如潜在Dirichlet分配(LDA))已成功应用于主题发现,但在应用于称为“帖子”的微博内容时,它们通常不太一致。 在本文中,我们提出了一种基于时间序列的微博主题建模聚合方案。 由于Weibo主题在时间片中连贯,我们将Weibo DataSet划分为按时间片分组。 使用此方案,每个组中的帖子使用基于段落 - 向量的相似性算法聚合成几个更长的伪文档。 在将该方案应用于LDA模型的同时,我们大大降低了模型困惑,并提高了聚类质量,这也允许更好地发现微博中的基本主题。 此外,我们可以让LDA上扩展的其他主题模型直接用于此类简短文本。 (c)2019 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号