首页> 外文会议>AAAI Conference on Artificial Intelligence >Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling
【24h】

Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling

机译:概率非负矩阵分解及其主题建模的强大扩展

获取原文

摘要

Traditional topic model with maximum likelihood estimate inevitably suffers from the conditional independence of words given the documents topic distribution. In this paper, we follow the generative procedure of topic model and learn the topic-word distribution and topics distribution via directly approximating the word-document co-occurrence matrix with matrix decomposition technique. These methods include: (1) Approximating the normalized document-word conditional distribution with the documents probability matrix and words probability matrix based on probabilistic non-negative matrix factorization (NMF); (2) Since the standard NMF is well known to be non-robust to noises and outliers, we extended the probabilistic NMF of the topic model to its robust versions using l_(2,1)-norm and capped l_(2,1)-norm based loss functions, respectively. The proposed framework inherits the explicit probabilistic meaning of factors in topic models and simultaneously makes the conditional independence assumption on words unnecessary. Straightforward and efficient algorithms are exploited to solve the corresponding non-smooth and non-convex problems. Experimental results over several benchmark datasets illustrate the effectiveness and superiority of the proposed methods.
机译:传统主题模型具有最大似然估计的缺乏赋予文档主题分布的单词的条件独立性。在本文中,我们遵循主题模型的生成过程,并通过直接逼近具有矩阵分解技术的单词文档共生矩阵来了解主题字分布和主题分布。这些方法包括:(1)基于概率非负矩阵分解(NMF),近似于文档概率矩阵和单词概率矩阵的归一化文档词条条件分布; (2)由于标准NMF是众所周知的噪声和异常值,因此我们使用L_(2,1)-NORM和CAPT L_(2,1)将主题模型的概率NMF扩展到其强大版本 - 基于丢失功能。所提出的框架继承了主题模型中因素的显式概率含义,同时使得不需要的单词的条件独立假设。利用直接和高效的算法来解决相应的非平滑和非凸起问题。在几个基准数据集上的实验结果说明了所提出的方法的有效性和优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号