首页> 外文期刊>Information Processing & Management >Neural variational sparse topic model for sparse explainable text representation
【24h】

Neural variational sparse topic model for sparse explainable text representation

机译:稀疏可解释文本表示的神经变分稀疏主题模型

获取原文
获取原文并翻译 | 示例
       

摘要

Texts are the major information carrier for internet users, from which learning the latent representations has important research and practical value. Neural topic models have been proposed and have great performance in extracting interpretable latent topics and representations of texts. However, there remain two major limitations: (1) these methods generally ignore the contextual information of texts and have limited feature representation ability due to the shallow feed-forward network architecture, (2) Sparsity of the representations in topic semantic space is ignored. To address these issues, in this paper, we propose a semantic reinforcement neural variational sparse topic model (SR-NSTM) towards explainable and sparse latent text representation learning. Compared with existing neural topic models, SR-NSTM models the generative process of texts with probabilistic distributions parameterized with neural networks and incorporates Bi-directional LSTM to embed contextual information at the document level. It achieves sparse posterior representations over documents and words with zero-mean Laplace distribution and topics with sparsemax. Moreover, we propose a supervised extension of SR-NSTM via adding the max-margin posterior regularization to tackle the supervised tasks. The neural variational inference method is utilized to learn our models efficiently. Experimental results on Web Snippets, 20Newsgroups, BBC, and Biomedical datasets demonstrate that the contextual information and revisiting generative process can improve the performance, leading to the competitive performance of our models in learning coherent topics and explainable sparse representations for texts.
机译:文本是互联网用户的主要信息载体,从中学习潜在的代表具有重要的研究和实用价值。提出了神经主题模型,并在提取可解释的潜在主题和文本的陈述方面具有很大的表现。但是,仍有两个主要限制:(1)这些方法通常忽略文本的上下文信息,并且由于浅前的前馈网络架构而具有限制功能,(2)主题语义空间的表示的稀疏性被忽略。为了解决这些问题,在本文中,我们提出了一个语义加强神经变分稀疏主题模型(SR-NSTM),可解释和稀疏潜在文本表示学习。与现有的神经主题模型相比,SR-NSTM模拟了具有用神经网络参数化的概率分布的文本的生成过程,并在文档级别嵌入了双向LSTM来嵌入上下文信息。它通过Zero-Mean Laplace分销和具有Sparsemax的主题来实现文档和单词的稀疏后代表示。此外,我们通过添加MAX-MARIM后正则化来提出SR-NSTM的监督扩展,以解决监督任务。神经变分推理方法用于有效地学习模型。 Web代码段的实验结果,20Newsgroups,BBC和生物医学数据集表明,上下文信息和重新审视生成过程可以提高性能,导致我们模型在学习相干主题中的竞争性能,并解释了文本的可解释稀疏表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号