首页> 外文会议>Annual conference on Neural Information Processing Systems >When Are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity
【24h】

When Are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

机译:何时何时才能识别overclete主题模型?带有结构稀疏性的张力塔克的唯一性

获取原文
获取外文期刊封面目录资料

摘要

Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identified given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identifiable, we establish generic identifiability under a constraint, referred to as topic persistence. Our sufficient conditions for identifiability involve a novel set of "higher order" expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identifiable w.h.p. in the overcomplete regime. Our identifiability results allow for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identifiability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition.
机译:近年来,过度顺从的潜在代表对无人监督的特征学习非常受欢迎。在本文中,我们指定了可以识别哪种替代模型,可以识别一定订单的可观察力矩。我们考虑过度顺序制度的概率突变或主题模型,其中潜在主题的数量可以大大超过观察到的单词词汇量的大小。虽然普遍过度普遍的主题模型不可识别,但我们在约束下建立通用可识别性,称为主题持久性。我们对可识别性的充分条件涉及对主题字矩阵或模型的人口结构的一组新颖的“高阶”扩展条件。这组高阶扩展条件允许过度顺序模型,并要求存在从潜在主题到更高阶观察到的单词的完美匹配。我们建立了随机结构化主题模型是可识别的w.h.p.在过度普遍的政权中。我们的标识结果允许一般(非退化)分布用于建模主题比例,因此,我们可以在框架中处理任意相关的主题。我们的可识别性结果暗示了一类带有结构稀疏性的张量分解的唯一性,这些稀疏性包含在Tucker分解类中,但比Candecomp / Parafac(CP)分解更通用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号