Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

Hillebrand Lars; Biesner David; Bauckhage Christian; Sifa Rafet

首页> 外文期刊>Machine Learning and Knowledge Extraction >Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

【24h】

Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

机译：使用非负张量Dedicom的可解释主题提取和词嵌入学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches.

机译：无监督主题提取是从大型文本语料库自动提取简明内容信息的重要步骤。现有主题提取方法缺乏联系这些主题之间关系的能力，这将进一步帮助文本了解。因此，我们提出利用分解成定向分量（DEDICOM）算法，该算法为对称和非对称方矩阵和张量提供了一种唯一解释的矩阵分解。我们限制了Depicom到行 - 随机性和非消极性，以便分解额定相互信息矩阵和文本语料库的张力。我们识别潜在主题集群及其在词汇中的关系，同时学习可解释的单词嵌入。此外，我们基于交替梯度下降来介绍多种方法，以有效地训练约束的DEDICOM算法。我们在多个数据集中评估我们提出的方法的定性主题建模和单词嵌入性能，包括新颖的纽约时报新闻数据集，并演示了Dedicom算法如何提供比竞争矩阵分解方法更深入的文本分析。

著录项

来源
《Machine Learning and Knowledge Extraction》 |2021年第1期|共45页
作者
Hillebrand Lars; Biesner David; Bauckhage Christian; Sifa Rafet;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
matrix factorizationtensor factorizationword embeddingstopic modelingNLP;

机译：矩阵分子化卷分解派嵌入式ModelNingNLP;

相似文献

外文文献
中文文献
专利

1. Does deep learning help topic extraction? A kernel k-means clustering method with word embedding [J] . Zhang Yi, Lu Jie, Liu Feng, Journal of informetrics . 2018,第4期

机译：深度学习是否有助于主题提取？嵌入词的核k均值聚类方法
2. Integrating word embeddings and document topics with deep learning in a video classification framework [J] . Kastrati Zenun, Imran Ali Shariq, Kurti Arianit Pattern recognition letters . 2019,第Deca期

机译：在视频分类框架中将单词嵌入和文档主题与深度学习相集成
3. Inter and Intra Topic Structure Learning with Word Embeddings [J] . He Zhao, Lan Du, Wray Buntine, JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：与Word Embeddings学习的间际和内部主题结构
4. Interpretable Topic Extraction and Word Embedding Learning Using Row-Stochastic DEDICOM [C] . Lars Hillebrand, David Biesner, Christian Bauckhage, IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference for Machine Learning and Knowledge Extraction . 2020

机译：使用行随机DEDICOM的可解释主题提取和词嵌入学习
5. Capturing and Evaluating Higher Order Relations in Word Embeddings Using Tensor Factorization. [D] . Bailey, Eric. 2017

机译：使用张量因子分解来捕获和评估词嵌入中的高阶关系。
6. Locality Preserving Non-negative Basis Learning with Graph Embedding [O] . Yasser Ghanbari, John Herrington, Ruben C. Gur, -1

机译：用图形嵌入保留非负基础学习的地方
7. Does deep learning help topic extraction? A kernel k-means clustering method with word embedding [O] . Yi Zhang, Jie Lu, Feng Liu, 2018

机译：深度学习帮助主题提取吗？具有单词嵌入的内核K-means聚类方法

Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

摘要

著录项

相似文献

相关主题

期刊订阅