首页> 外文会议>International Joint Conference on Neural Networks >Optimize collapsed Gibbs sampling for biterm topic model by alias method
【24h】

Optimize collapsed Gibbs sampling for biterm topic model by alias method

机译:通过别名方法优化双项主题模型的折叠Gibbs采样

获取原文

摘要

With the popularity of social networks, such as mi-croblogs and Twitter, topic inference for short text is increasingly significant and essential for many content analysis tasks. Biterm topic model (BTM) is superior to conventional topic models in uncovering latent semantic relevance for short text. However, Gibbs sampling employed by BTM is very time consuming when inferring topics, especially for large-scale datasets. It requires O{K) operations per sample for K topics, where K denotes the number of topics in the corpus. In this paper, we propose an acceleration algorithm of BTM, FastBTM, using an efficient sampling method for BTM which only requires O(1) amortized time while the traditional ones scale linearly with the number of topics. FastBTM is based on Metropolis-Hastings and alias method, both of which have been widely adopted in latent Dirichlet allocation (LDA) model and achieved outstanding speedup. We carry out a number of experiments on Tweets2011 Collection dataset and Enron dataset, indicating that our method is robust enough for both short texts and normal documents. Our work can be approximately 9 times faster than traditional Gibbs sampling method per iteration, when setting K = 1000. The source code of FastBTM can be obtained from https://github.com/paperstudy/FastBTM.
机译:随着诸如微型croblogs和Twitter之类的社交网络的普及,针对短文本的主题推断变得越来越重要,并且对于许多内容分析任务而言必不可少。在发现短文本潜在的语义相关性方面,双项主题模型(BTM)优于常规主题模型。但是,BTM所采用的Gibbs采样在推断主题时非常耗时,尤其是对于大型数据集。对于K个主题,它需要每个样本进行O {K)个操作,其中K表示语料库中的主题数。在本文中,我们提出了一种BTM的加速算法,即FastBTM,它使用一种有效的BTM采样方法,该方法仅需要O(1)摊销时间,而传统方法则随主题数线性增长。 FastBTM基于Metropolis-Hastings和别名方法,这两种方法都已在潜在的Dirichlet分配(LDA)模型中被广泛采用,并实现了出色的加速效果。我们对Tweets2011 Collection数据集和Enron数据集进行了许多实验,表明我们的方法对于短文本和常规文档都足够健壮。当设置K = 1000时,我们的工作每次迭代可以比传统的Gibbs采样方法快大约9倍。FastBTM的源代码可以从https://github.com/paperstudy/FastBTM获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号