Unsupervised Topic Model Based Text Network Construction for Learning Word Embeddings

机译：基于无监督主题模型的文本网络学习词嵌入

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Distributed word embeddings have proven remarkably effective at capturing word level semantic and syntactic regularities in language for many natural language processing tasks. One recently proposed semi-supervised representation learning method called Predictive Text Embedding (PTE) utilizes both semantically labeled and unlabeled data in information networks to learn the embedding of text that produces state of-the-art performance when compared to other embedding methods. However, PTE uses supervised label information to construct one of the networks and many other possible ways of constructing such information networks are left untested. We present two unsupervised methods that can be used in constructing a large scale semantic information network from documents by combining topic models that have emerged as a powerful technique of finding useful structure in an unstructured text collection as it learns distributions over words. The first method uses Latent Dirichlet Allocation (LDA) to build a topic model over text, and constructs a word topic network with edge weights proportional to the word-topic probability distributions. The second method trains an unsupervised neural network to learn the word-document distribution, with a single hidden layer representing a topic distribution. The two weight matrices of the neural net are directly reinterpreted as the edge weights of heterogeneous text networks that can be used to train word embeddings to build an effective low dimensional representation that preserves the semantic closeness of words and documents for NLP tasks. We conduct extensive experiments to evaluate the effectiveness of our methods.

机译：事实证明，对于许多自然语言处理任务而言，分布式单词嵌入在捕获语言中的单词级别语义和句法规则方面非常有效。最近提出的一种称为预测文本嵌入（PTE）的半监督表示学习方法，利用信息网络中语义标记和未标记的数据来学习与其他嵌入方法相比具有最新技术性能的文本嵌入。但是，PTE使用监督的标签信息来构建网络之一，而构建此类信息网络的许多其他可能方式未经测试。通过结合主题模型，我们提出了两种可用于从文档构建大规模语义信息网络的无监督方法，这些主题模型已成为一种强大的技术，可以在非结构化文本集合中学习单词的分布，从而在有用的结构中找到有用的结构。第一种方法使用潜在Dirichlet分配（LDA）在文本上构建主题模型，并构造一个边缘权重与单词-主题概率分布成比例的单词主题网络。第二种方法训练一个无监督的神经网络来学习单词文档的分布，其中一个隐藏的层代表一个主题分布。神经网络的两个权重矩阵被直接重新解释为异构文本网络的边缘权重，可用于训练单词嵌入以构建有效的低维表示形式，从而保留单词和文档对NLP任务的语义紧密性。我们进行了广泛的实验，以评估我们方法的有效性。

著录项

来源
《IEEE International Conference on Machine Learning and Applications》|2019年|155-161|共7页
会议地点
作者
Sun Sunnie Chung; Michael DArcy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Task analysis; Semantics; Context modeling; Neural networks; Linear programming; Predictive models; Buildings;

机译：任务分析;语义;上下文建模;神经网络;线性编程;预测模型;建筑物;

相似文献

外文文献
中文文献
专利

1. Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning [J] . Alami Nabil, Meknassi Mohammed, En-nahnahi Noureddine Expert Systems with Application . 2019,第JUNa期

机译：通过词嵌入和集成学习来增强基于文本的无监督神经网络汇总
2. Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling [J] . Li Shuangyin, Pan Rong, Luo Haoyu, Knowledge-Based Systems . 2021,第Apra22期

机译：与无监督主题建模的自适应交叉上下文词嵌入Word Polysemy
3. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
4. Unsupervised Topic Model Based Text Network Construction for Learning Word Embeddings [C] . Sun Sunnie Chung, Michael DArcy IEEE International Conference on Machine Learning and Applications . 2019

机译：基于无监督的主题模型学习词嵌入式的文本网络构建
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Unsupervised learning of temporal features for word categorization in a spiking neural network model of the auditory brain [O] . Irina Higgins, Simon Stringer, Jan Schnupp -1

机译：听脑突刺神经网络模型中单词分类的时态特征的无监督学习
7. Short Text Classification Based on Latent Topic Modeling and Word Embedding [O] . Peng LI, Jun-Qing HE, Cheng-Long MA 2017

机译：基于潜在主题建模和单词嵌入的简短文本分类

Unsupervised Topic Model Based Text Network Construction for Learning Word Embeddings

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅