Scalable Collapsed Inference for High-Dimensional Topic Models

机译：高维主题模型的可伸缩折叠推理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The bigger the corpus, the more topics it can potentially support. To truly make full use of massive text corpora, a topic model inference algorithm must therefore scale efficiently in 1) documents and 2) topics, while 3) achieving accurate inference. Previous methods have achieved two out of three of these criteria simultaneously, but never all three at once. In this paper, we develop an online inference algorithm for topic models which leverages stochasticity to scale well in the number of documents, sparsily to scale well in the number of topics, and which operates in the collapsed representation of the topic model for improved accuracy and run-time performance. We use a Monte Carlo inner loop in the online setting to approximate the collapsed vari-ational Bayes updates in a sparse and efficient way, which we accomplish via the Metropolis-Hastings Walker method. We showcase our algorithm on LDA and the recently proposed mixed membership skip-gram topic model. Our method requires only amortized O(kd) computation per word token instead of O(K) operations, where the number of topics occurring for a particular document k_d « the total number of topics in the corpus A', to converge to a high-quality solution.

机译：语料库越大，可能潜在地支持的主题越多。为了真正充分利用大规模的文本语料库，主题模型推理算法必须在1）文档和2）主题中有效缩放，而3）实现精确推理。以前的方法同时实现了其中三个标准中的两个，但从来没有一次。在本文中，我们开发了一个主题模型的在线推理算法，它利用了随机性在文件数量中扩展了速度，略微缩放到主题的数量，并在主题模型的折叠表示中运行，以提高精度和运行时性能。我们在在线设置中使用蒙特卡罗内部循环以稀疏和有效的方式近似折叠的Vari-Ational Bayes更新，我们通过Metropolis-Hastings Walker方法完成。我们在LDA和最近提出的混合成员跳过克主题模型上展示了我们的算法。我们的方法只需要每个单词令牌的摊销O（kd）计算而不是o（k）操作，其中特定文档k_d«语料库A'中主题总数的主题的数量，以收敛到高 - 质量解决方案。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2019年|2836-2845|共10页
会议地点
作者
Rashidul Islam; James Foulds;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models [J] . Magnusson Mans, Jonsson Leif, Villani Mattias, Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2018,第2期

机译：稀疏部分折叠MCMC主题模型中的并行推理
2. Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models [J] . Magnusson Mans, Jonsson Leif, Villani Mattias, Journal of computational and graphical statistics: A joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America . 2018,第2期

机译：稀疏部分折叠MCMC主题模型中的并行推理
3. Scalable Inference in Max-margin Topic Models [J] . Jun Zhu, Xun Zheng, Li Zhou, SIGKDD explorations . 2013,第CDaROM期

机译：最大利润主题模型中的可扩展推理
4. Scalable Collapsed Inference for High-Dimensional Topic Models [C] . Rashidul Islam, James Foulds Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2019

机译：可扩展的高维主题模型的折叠推断
5. Some Topics in High-Dimensional Robust Inference and Graphical Modeling [D] . Song, Youngseok. 2021

机译：高维强度推理和图形建模的一些主题
6. Likelihood-Free Inference in High-Dimensional Models [O] . Athanasios Kousathanas, Christoph Leuenberger, Jonas Helfer, 2016

机译：高维模型中的无似然推断
7. Scalable Collapsed Inference for High-Dimensional Topic Models [O] . Rashidul Islam, James Foulds 2019

机译：可扩展的高维主题模型的折叠推断

Scalable Collapsed Inference for High-Dimensional Topic Models

摘要

著录项

相似文献

相关主题

期刊订阅