首页> 外文会议>European conference on machine learning and principles and practice of knowledge discovery in databases >Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models
【24h】

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

机译:递阶Dirichlet过程主题模型的在线稀疏折叠混合变分Gibbs算法

获取原文

摘要

Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid va-riational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while simultaneously speeding up variational updates. This idea has previously been applied to standard latent Dirichlet allocation (LDA). We propose a new sampling method that enables the application of the idea to the nonparametric version of LDA, hierarchical Dirichlet process topic models. Our fast sampling method leads to a significant speedup of variational updates as compared to other sampling methods. Experiments show that training of our topic model converges to a better log-likelihood than previously existing variational methods and converges faster than Gibbs sampling in the batch setting.
机译:使用Gibbs采样或变分贝叶斯最常训练用于文本分析的主题模型。最近,已经发现混合可变Gibbs算法结合了两个方面的优势。变式算法可以快速收敛,并且在推断新文档时效率更高。 Gibbs采样使稀疏更新成为可能,因为每个令牌仅与一个主题相关联,而不是与所有主题相关联。此外,Gibbs采样是无偏的。尽管Gibbs采样收敛所需的时间更长,但可以保证在无数次迭代后到达真实的后验。通过将两种方法结合起来,可以减少变化方法的偏差,同时加快变化更新的速度。此想法以前已应用于标准潜在Dirichlet分配(LDA)。我们提出了一种新的采样方法,该方法使该思想能够应用于LDA的非参数版本,即分层Dirichlet过程主题模型。与其他采样方法相比,我们的快速采样方法可显着加快变体更新的速度。实验表明,与以前的变分方法相比,对主题模型的训练收敛到更好的对数似然性,并且在批处理设置中,收敛速度比Gibbs采样快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号