BTM: Topic Modeling over Short Texts

Cheng X.; Yan X.; Lan Y.; Guo J.

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >BTM: Topic Modeling over Short Texts

【24h】

BTM: Topic Modeling over Short Texts

机译：BTM：短文本主题建模

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Short texts are popular on today’s web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as . BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e., biterms) in the corpus, making the inference effective with the rich corpus-level information. To cope with large scale short text data, we further introduce two online algorithms for BTM for efficient topic learning. Experiments on real-word short text collections show that BTM can discover more prominent and coherent topics, and significantly outperform the state-of-the-art baselines. We also demonstrate the appealing performance of the two online BTM algorithms on both time efficiency and topic learning.

机译：短文本在当今的网络上很流行，尤其是随着社交媒体的出现。对于许多内容分析任务而言，从大规模短文本中推断主题成为一项至关重要但具有挑战性的任务。常规主题模型（例如潜在狄利克雷分配（LDA）和概率潜在语义分析（PLSA））通过将每个文档建模为主题的混合物来从文档级单词共现中学习主题，其推理受到单词共现的稀疏性的影响短文本中的模式。在本文中，我们提出了一种用于短文本主题建模的新颖方法，称为。 BTM通过直接对语料库中单词共现模式（即双项）的生成进行建模来学习主题，从而利用丰富的语料库级信息使推理有效。为了处理大规模的短文本数据，我们进一步介绍了两种用于BTM的在线算法，以实现高效的主题学习。对实词短文本集合的实验表明，BTM可以发现更突出和更连贯的主题，并且大大优于最新的基准。我们还展示了两种在线BTM算法在时间效率和主题学习上的吸引力。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2014年第12期|2928-2941|共14页
作者
Cheng X.; Yan X.; Lan Y.; Guo J.;
展开▼
作者单位

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R., China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithm design and analysis; Analytical models; Context modeling; Data models; Inference algorithms; Semantics; Time complexity; Short text; biterm; content analysis; online algorithm; topic model;

机译：算法设计与分析;分析模型;上下文建模;数据模型;推理算法;语义;时间复杂度;文本;双向;内容分析;在线算法;主题模型;

相似文献

外文文献
中文文献
专利

1. BTM and GloVe Similarity Linear Fusion-Based Short Text Clustering Algorithm for Microblog Hot Topic Discovery [J] . Wu Di, Zhang Mengtian, Shen Chao, Quality Control, Transactions . 2020,第期

机译：基于BTM和手套相似性线性融合的微博热门主题发现的简短文本聚类算法
2. Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection [J] . Hu Xuegang, Wang Haiyan, Li Peipei Pattern recognition letters . 2018,第DECa1期

机译：使用短文本扩展和概念漂移检测的基于在线Biterm主题模型的短文本流分类
3. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
4. Short Text Topic Discovery Based on BTM Topic Model [C] . Wei-Dong Zhu, Wen-Gan Zhou International Conference on Management Science and Management Innovation . 2019

机译：基于BTM主题模型的短文本主题发现
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis [O] . Rania Albalawi, Tet Hin Yeap, Morad Benyoucef 2020

机译：使用短文本数据的主题建模方法：比较分析
7. BTM Topic Modeling Approach to Named Entity Linking [O] . Menglu You, Hong Yang, Zhengkui Lin, 2018

机译：命名实体链接的BTM主题建模方法

BTM: Topic Modeling over Short Texts

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅