...
【24h】

BTM: Topic Modeling over Short Texts

机译:BTM:短文本主题建模

获取原文
获取原文并翻译 | 示例
           

摘要

Short texts are popular on today’s web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as . BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.
机译:短文本在当今的网络上很流行,尤其是随着社交媒体的出现。对于许多内容分析任务而言,从大规模短文本中推断主题成为一项至关重要但具有挑战性的任务。常规主题模型(例如潜在狄利克雷分配(LDA)和概率潜在语义分析(PLSA))通过将每个文档建模为主题的混合物来从文档级单词共现中学习主题,其推理受到单词共现的稀疏性的影响短文本中的模式。在本文中,我们提出了一种用于短文本主题建模的新颖方法,称为。 BTM通过直接对语料库中单词共现模式(即双项)的生成进行建模来学习主题,从而利用丰富的语料库级信息使推理有效。为了处理大规模的短文本数据,我们进一步介绍了两种用于BTM的在线算法,以实现高效的主题学习。对实词短文本集合的实验表明,BTM可以发现更突出和更连贯的主题,并且大大优于最新的基准。我们还展示了两种在线BTM算法在时间效率和主题学习上的吸引力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号