FastBTM: Reducing the sampling time for biterm topic model

He Xingwei; Xu Hua; Li Jia; He Liu; Yu Linlin

首页> 外文期刊>Knowledge-Based Systems >FastBTM: Reducing the sampling time for biterm topic model

【24h】

FastBTM: Reducing the sampling time for biterm topic model

机译：FastBTM：减少双项主题模型的采样时间

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Due to the popularity of social networks, such as microblogs and Twitter, a vast amount of short text data is created every day. Much recent research in short text becomes increasingly significant, such as topic inference for short text. Biterm topic model (BTM) benefits from the word co-occurrence patterns of the corpus, which makes it perform better than conventional topic models in uncovering latent semantic relevance for short text. However, BTM resorts to Gibbs sampling to infer topics, which is very time consuming, especially for large-scale datasets or when the number of topics is extremely large. It requires 0(K) operations per sample for K topics, where K denotes the number of topics in the corpus. In this paper, we propose an acceleration algorithm of BTM, FastBTM, using an efficient sampling method for BTM, which converges much faster than BTM without degrading topic quality. FastBTM is based on Metropolis Hastings and alias method, both of which have been widely adopted in Latent Dirichlet Allocation (LDA) model and achieved outstanding speedup. Our FastBTM can effectively reduce the sampling complexity of biterm topic model from 0(K) to 0(1) amortized time. We carry out a number of experiments on three datasets including two short text datasets, Tweets2011 Collection dataset and Yahoo! Answers dataset, and one long document dataset, Enron dataset. Our experimental results show that when the number of topics K increases, the gap in running time speed between FastBTM and BTM gets especially larger. In addition, our FastBTM is effective for both short text datasets and long document datasets. (C) 2017 Elsevier B.V. All rights reserved.

机译：由于诸如微博和Twitter之类的社交网络的普及，每天都会创建大量的短文本数据。短文本的许多最新研究变得越来越重要，例如短文本的主题推断。双项主题模型（BTM）受益于语料库的单词共现模式，这使得它在显示短文本潜在的语义相关性方面比传统主题模型表现更好。但是，BTM依靠Gibbs采样来推断主题，这非常耗时，特别是对于大规模数据集或主题数量非常大的情况。对于K个主题，每个样本需要进行0（K）个操作，其中K表示语料库中的主题数。在本文中，我们提出了一种BTM的加速算法，即FastBTM，它使用一种针对BTM的有效采样方法，其收敛速度比BTM快得多，并且不会降低主题质量。 FastBTM基于Metropolis Hastings和别名方法，这两种方法已在潜在Dirichlet分配（LDA）模型中被广泛采用，并实现了出色的加速。我们的FastBTM可以有效地将双项主题模型的采样复杂度从0（K）减少到0（1）摊销时间。我们对三个数据集进行了许多实验，其中包括两个短文本数据集，Tweets2011 Collection数据集和Yahoo!。答案数据集，以及一个长文档数据集，即安然数据集。我们的实验结果表明，当主题数K增加时，FastBTM和BTM之间的运行时间速度差距特别大。此外，我们的FastBTM对于短文本数据集和长文档数据集均有效。（C）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2017年第15期|11-20|共10页
作者
He Xingwei; Xu Hua; Li Jia; He Liu; Yu Linlin;
展开▼
作者单位

Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China;

Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China;

Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China;

Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China;

Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
BTM; Topic model; Alias method; Metropolis-Hastings; Acceleration algorithm;

机译：BTM;主题模型;别名方法;Metropolis-Hastings;加速算法;

相似文献

外文文献
中文文献
专利

1. Two time-efficient gibbs sampling inference algorithms for biterm topic model [J] . Zhou Xiaotang, Ouyang Jihong, Li Ximing Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2018,第3期

机译：Biterm主题模型的两个Quey-Questive Gibbs采样推理算法
2. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
3. A Robust User Sentiment Biterm Topic Mixture Model Based on User Aggregation Strategy to Avoid Data Sparsity for Short Text [J] . Nimala K., Jebakumar R. Journal of medical systems . 2019,第4期

机译：一种强大的用户情感比特妨据基于用户聚合策略的混合模型，以避免短文本的数据稀疏性
4. Optimize collapsed Gibbs sampling for biterm topic model by alias method [C] . Xingwei He, Hua Xu, Xiaomin Sun, International Joint Conference on Neural Networks . 2017

机译：通过别名方法优化双项主题模型的折叠Gibbs采样
5. Improved haptic fidelity via reduced sampling period with an FPGA-based real-time hardware platform. [D] . Kopp, Emilie. 2007

机译：通过基于FPGA的实时硬件平台缩短采样周期，从而提高触觉保真度。
6. Initializing and Growing a Database of Health Information Technology (HIT) Events by Using TF-IDF and Biterm Topic Modeling [O] . Hong Kang, Zhiguo Yu, Yang Gong 2017

机译：通过使用TF-IDF和Biterm主题建模来初始化和增长健康信息技术（HIT）事件的数据库
7. AOBTM: Adaptive Online Biterm Topic Modeling for Version Sensitive Short-texts Analysis [O] . Mohammad Abdul Hadi, Fatemeh H Fard 2020

机译：AOBTM：适应性在线比特频主题型号，用于版本敏感的短文本分析
8. Topics in Time Series Analysis. III. ARIMA Time Series Models with Non-Normal Shocks. [R] . ledolter, J., Box, G. E. P. 1976

机译：时间序列分析的主题。 III。具有非正常冲击的aRIma时间序列模型。

FastBTM: Reducing the sampling time for biterm topic model

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅