...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Spectral Methods for Correlated Topic Models
【24h】

Spectral Methods for Correlated Topic Models

机译:相关主题模型的光谱方法

获取原文
           

摘要

In this paper we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA). We overcome the limitation of LDA to incorporate arbitrary topic correlations, by assuming that the hidden topic proportions are drawn from a flexible class of Normalized Infinitely Divisible (NID) distributions. NID distributions are generated by normalizing a family of independent Infinitely Divisible (ID) random variables. The Dirichlet distribution is a special case obtained by normalizing a set of Gamma random variables. We prove that this flexible topic model class can be learnt via spectral methods using only moments up to the third order, with (low order) polynomial sample and computational complexity. The proof is based on a key new technique derived here that allows us to diagonalize the moments of the NID distribution through an efficient procedure that requires evaluating only univariate integrals, despite the fact that we are handling high dimensional multivariate moments. In order to assess the performance of our proposed Latent NID topic model, we use two real datasets of articles collected from New York Times and Pubmed. Our experiments yield improved perplexity on both datasets compared with the baseline.
机译:在本文中,我们提出了用于学习广泛主题模型的有保证的光谱方法,该方法概括了流行的潜在Dirichlet分配(LDA)。通过假设隐藏的主题比例是从灵活的归一化的无限可整(NID)分布类中得出的,我们克服了LDA合并任意主题相关性的限制。 NID分布是通过标准化一系列独立的无限可分(ID)随机变量生成的。 Dirichlet分布是通过规范一组Gamma随机变量获得的特殊情况。我们证明了这种灵活的主题模型类可以通过仅使用三阶矩,具有(低阶)多项式样本和计算复杂度的矩的频谱方法来学习。该证明基于此处衍生的一项关键新技术,该技术使我们能够通过一个仅需要评估单变量积分的有效过程来对角化NID分布的矩,尽管我们正在处理高维多元矩。为了评估我们提出的潜在NID主题模型的性能,我们使用了两个来自《纽约时报》和Pubmed的真实文章数据集。与基线相比,我们的实验在两个数据集上均改善了困惑度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号