首页> 外文期刊>Machine Learning and Knowledge Extraction >Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM
【24h】

Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

机译:使用非负张量Dedicom的可解释主题提取和词嵌入学习

获取原文
       

摘要

Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches.
机译:无监督主题提取是从大型文本语料库自动提取简明内容信息的重要步骤。现有主题提取方法缺乏联系这些主题之间关系的能力,这将进一步帮助文本了解。因此,我们提出利用分解成定向分量(DEDICOM)算法,该算法为对称和非对称方矩阵和张量提供了一种唯一解释的矩阵分解。我们限制了Depicom到行 - 随机性和非消极性,以便分解额定相互信息矩阵和文本语料库的张力。我们识别潜在主题集群及其在词汇中的关系,同时学习可解释的单词嵌入。此外,我们基于交替梯度下降来介绍多种方法,以有效地训练约束的DEDICOM算法。我们在多个数据集中评估我们提出的方法的定性主题建模和单词嵌入性能,包括新颖的纽约时报新闻数据集,并演示了Dedicom算法如何提供比竞争矩阵分解方法更深入的文本分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号