首页> 外文会议>European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases >Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models
【24h】

Online Sparse Collapsed Hybrid Variational-Gibbs Algorithm for Hierarchical Dirichlet Process Topic Models

机译:在线稀疏折叠混合变分 - GIBBS分层Dirichlet过程主题模型算法

获取原文

摘要

Topic models for text analysis are most commonly trained using either Gibbs sampling or variational Bayes. Recently, hybrid variational-Gibbs algorithms have been found to combine the best of both worlds. Variational algorithms are fast to converge and more efficient for inference on new documents. Gibbs sampling enables sparse updates since each token is only associated with one topic instead of a distribution over all topics. Additionally, Gibbs sampling is unbiased. Although Gibbs sampling takes longer to converge, it is guaranteed to arrive at the true posterior after infinitely many iterations. By combining the two methods it is possible to reduce the bias of variational methods while simultaneously speeding up variational updates. This idea has previously been applied to standard latent Dirichlet allocation (LDA). We propose a new sampling method that enables the application of the idea to the nonparametric version of LDA, hierarchical Dirichlet process topic models. Our fast sampling method leads to a significant speedup of variational updates as compared to other sampling methods. Experiments show that training of our topic model converges to a better log-likelihood than previously existing variational methods and converges faster than Gibbs sampling in the batch setting.
机译:文本分析的主题模型最常见的是使用GIBBS采样或变分贝内斯进行培训。最近,已经发现混合变形吉布斯算法将两个世界的最佳算法结合在一起。变分算法快速收敛,更有效地对新文档推断。 Gibbs采样启用稀疏更新,因为每个令牌仅与一个主题相关联而不是在所有主题上都有分布。此外,GIBBS采样是无偏见的。虽然GIBBS采样需要更长的时间来融合,但它保证在无限的迭代之后到达真正的后退。通过组合两种方法,可以减少变分方法的偏差,同时加速变分更新。此想法先前已应用于标准潜在的Dirichlet分配(LDA)。我们提出了一个新的采样方法,使的想法,以LDA,层次Dirichlet过程的主题模型的非参数版本的应用程序。与其他采样方法相比,我们的快速采样方法导致变分更新的显着加速。实验表明,我们的主题模型的培训会收敛到比以前现有的变分方法更好的对数似然性,并在批处理设置中加入比GIBBS采样更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号