Aggregating Neural Word Embeddings for Document Representation

机译：聚合神经词嵌入用于文档表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent advances in natural language processing (NLP) have shown that semantically meaningful representations of words can be efficiently acquired by distributed models. In such a case, a text document can be viewed as a bag-of-word-embeddings (BoWE), and the remaining question is how to obtain a fixed-length vector representation of the document for efficient document process. Beyond those heuristic aggregation methods, recent work has shown that one can leverage the Fisher kernel (FK) framework to generate document representations based on BoWE in a principled way. In this work, words are embedded into a Euclidean space by latent semantic indexing (LSI), and a Gaussian Mixture Model (GMM) is employed as the generative model for nonlinear FK-based aggregation. In this work, we propose an alternate FK-based aggregation method for document representation based on neural word embeddings. As we know, neural embedding models have been proven significantly better performance in word representations than LSI, where semantic relations between neural word embeddings are typically measured by cosine similarity rather than Euclidean distance. Therefore, we introduce a mixture of Von Mises-Fisher distributions (moVMF) as the generative model of neural word embeddings, and derive a new FK-based aggregation method for document representation based on BoWE. We report document classification, clustering and retrieval experiments and demonstrate that our model can produce state-of-the-art performance as compared with existing baseline methods.

机译：自然语言处理（NLP）的最新进展表明，分布式模型可以有效地获取单词的语义有意义的表示形式。在这种情况下，可以将文本文档视为词袋嵌入（BoWE），剩下的问题是如何获取文档的固定长度矢量表示，以进行有效的文档处理。除了这些启发式聚合方法之外，最近的工作表明，人们可以利用Fisher Fisher（FK）框架以有原则的方式基于BoWE生成文档表示。在这项工作中，单词通过潜在语义索引（LSI）嵌入到欧氏空间中，并且高斯混合模型（GMM）被用作基于FK的非线性聚合的生成模型。在这项工作中，我们提出了一种替代的基于FK的基于神经词嵌入的文档表示聚合方法。众所周知，事实证明，神经嵌入模型在单词表示方面的性能明显优于LSI，后者通常通过余弦相似度而不是欧几里得距离来度量神经单词嵌入之间的语义关系。因此，我们引入了Von Mises-Fisher分布（moVMF）的混合作为神经词嵌入的生成模型，并推导了基于BoWE的基于FK的文档表示新聚集方法。我们报告了文档分类，聚类和检索实验，并证明与现有的基准方法相比，我们的模型可以产生最先进的性能。

著录项

来源
《European conference on IR research》|2018年|303-315|共13页
会议地点
作者
Ruqing Zhang; Jiafeng Guo; Yanyan Lan; Jun Xu; Xueqi Cheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis [J] . Jimenez Sergio, Gonzalez Fabio A., Gelbukh Alexander, IEEE computational intelligence magazine . 2019,第2期

机译：word2set：基于词网的词表示与神经词嵌入竞争，以进行词汇相似度和情感分析
2. word2set: WordNet-Based Word Representation Rivaling Neural Word Embedding for Lexical Similarity and Sentiment Analysis [J] . Jimenez Sergio, Gonzalez Fabio A., Gelbukh Alexander, IEEE computational intelligence magazine . 2019,第2期

机译：Word2Set：基于Wordnet的字表示竞争神经词嵌入词汇相似性和情感分析
3. A HYBRID WORD EMBEDDING MODEL BASED ON ADMIXTURE OF POISSON-GAMMA LATENT DIRICHLET ALLOCATION MODEL AND DISTRIBUTED WORD-DOCUMENT-TOPIC REPRESENTATION [J] . IBRAHIM BAKARI BALA, MOHD ZAINURI SARINGAT, AIDA MUSTAPHA Journal of Theoretical and Applied Information Technology . 2020,第9期

机译：一种基于泊松 - 伽马潜在Dirichlet分配模型和分布式字文档主题表示的混合词嵌入模型
4. Aggregating Neural Word Embeddings for Document Representation [C] . Ruqing Zhang, Jiafeng Guo, Yanyan Lan, European Conference on Information Retrieval Research . 2018

机译：聚合神经单词嵌入文档表示
5. A Study On Semantic Relation Representations In Neural Word Embeddings [D] . Chen, Zhiwei. 2017

机译：神经词嵌入中的语义关系表示研究
6. Mining e-cigarette adverse events in social media using Bi-LSTM recurrent neural network with word embedding representation [O] . Jiaheng Xie, Xiao Liu, Daniel Dajun Zeng 2018

机译：使用带词嵌入表示的Bi-LSTM递归神经网络挖掘社交媒体中的电子烟不良事件
7. Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology Based Representations [O] . Michel, Paul, Ravichander, Abhilasha, Rijhwani, Shruti 2017

机译：Word嵌入的几何有助于文档分类吗？一个基于持久同源的表征案例研究

Aggregating Neural Word Embeddings for Document Representation

摘要

著录项

相似文献

相关主题

期刊订阅