首页> 外文期刊>Neurocomputing >Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization
【24h】

Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization

机译:马尔可夫随机游动结构近似和稀疏正则化的局部加权嵌入主题建模

获取原文
获取原文并翻译 | 示例

摘要

Topic model is a practical method for learning interpretable models of text corpora and have become a key problem of document representation. Some recently proposed topic models incorporate the intrinsic geometrical information of the document manifold and yield a discriminative topic representation. However, the existing manifold-inspired topic models fail to provide the probability weighting information of local geometrical pattern, thus leads to a limitation to estimate intrinsic semantic information of topic representation. In this paper, we consider the problem of topic modeling with intrinsic structure of document manifold and propose an unsupervised AutoEncoder-based topic modeling framework, named locally weighted embedding topic model (LWE-TM). Different from existing manifold-inspired topic models, LWE-TM defines a group of probability coefficients to uncover the local geometrical pattern by the Markov random walk structure of affinity graph, and regularizes the training of sparse AutoEncoder (sAE) to explicitly recover such local geometrical pattern with the topics encoding. Under the regularized training framework, the encoding network becomes local-invariant around the neighborhood of the document manifold and enable us to perform a readily topic inference for out-of-sample documents, efficiently improving the generalization and discrimination of topics encoding. The experimental results on two widely-used corpus demonstrate the superiority of LWE-TM to comparative models in document modeling, document clustering and classification tasks. (C) 2018 Elsevier B.V. All rights reserved.
机译:主题模型是一种学习文本语料库可解释模型的实用方法,已成为文档表示的关键问题。最近提出的一些主题模型结合了文档流形的固有几何信息,并产生了可区分的主题表示。然而,现有的受歧管启发的主题模型无法提供局部几何图案的概率加权信息,从而导致估计主题表示的固有语义信息受到限制。在本文中,我们考虑了具有文档流形固有结构的主题建模问题,并提出了一种基于AutoEncoder的无监督主题建模框架,称为局部加权嵌入主题模型(LWE-TM)。与现有的受歧管启发的主题模型不同,LWE-TM定义了一组概率系数,以通过亲和图的马尔可夫随机游动结构来揭示局部几何图案,并规范化了稀疏自动编码器(sAE)的训练以显式恢复此类局部几何模式与主题编码。在正则化训练框架下,编码网络在文档流形附近变得局部不变,使我们能够轻松地对样本外文档进行主题推断,从而有效地提高了主题编码的泛化性和区分性。在两个广泛使用的语料库上的实验结果表明,LWE-TM在文档建模,文档聚类和分类任务方面优于比较模型。 (C)2018 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2018年第12期|35-50|共16页
  • 作者单位

    Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China;

    Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China;

    Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China;

    Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China;

    Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China;

    Univ Engn & Technol, Dept Comp Sci, Taxila 47050, Punjab, Pakistan;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Topic model; Sparse AutoEncoder; Markov random walk; Affine mapping;

    机译:主题模型;稀疏自动编码器;马尔可夫随机游动;仿射映射;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号