Learning distributed word representation with multi-contextual mixed embedding

Li Jianqiang; Li Jing; Fu Xianghua; Masud M. A.; Huang Joshua Zhexue

首页> 外文期刊>Knowledge-Based Systems >Learning distributed word representation with multi-contextual mixed embedding

【24h】

Learning distributed word representation with multi-contextual mixed embedding

机译：通过多上下文混合嵌入学习分布式单词表示

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Learning distributed word representations has been a popular method for various natural language processing applications such as word analogy and similarity, document classification and sentiment analysis. However, most existing word embedding models only exploit a shallow slide window as the context to predict the target word. Because the semantic of each word is also influenced by its global context, as the distributional models usually induced the. word representations from the global co-occurrence matrix, the window-based models are insufficient to capture semantic knowledge. In this paper, we propose a novel hybrid model called mixed word embedding (MWE) based on the well-known word2vec toolbox. Specifically, the proposed MWE model combines the two variants of word2vec, i.e., SKIP-GRAM and CBOW, in a seamless way via sharing a common encoding structure, which is able to capture the syntax information of words more accurately. Furthermore, it incorporates a global text vector into the CBOW variant so as to capture more semantic information. Our MWE preserves the same time complexity as the SKIP-GRAM. To evaluate our MWE model efficiently and adaptively, we study our model on linguistic and application perspectives with both English and Chinese dataset. For linguistics, we conduct empirical studies on word analogies and similarities. The learned latent representations on both document classification and sentiment analysis are considered for application point of view of this work. The experimental results show that our MWE model is very competitive in all tasks as compared with the state-of-the-art word embedding models such as CBOW, SKIP-GRAM, and GloVe. (C) 2016 Elsevier B.V. All rights reserved.

机译：对于各种自然语言处理应用程序，例如单词类比和相似度，文档分类和情感分析，学习分布式单词表示法已成为一种流行的方法。但是，大多数现有的词嵌入模型仅利用浅滑动窗口作为上下文来预测目标词。由于每个单词的语义也受其全局上下文的影响，因此通常会使用分布模型来诱导单词的语义。基于全局共现矩阵的单词表示，基于窗口的模型不足以捕获语义知识。在本文中，我们基于著名的word2vec工具箱，提出了一种称为混合词嵌入（MWE）的新颖混合模型。具体地，所提出的MWE模型通过共享通用的编码结构以无缝的方式组合了word2vec的两个变体，即，SKIP-GRAM和CBOW，其能够更准确地捕获单词的语法信息。此外，它将全局文本向量合并到CBOW变体中，以便捕获更多语义信息。我们的MWE保留了与SKIP-GRAM相同的时间复杂度。为了有效，自适应地评估我们的MWE模型，我们使用英语和中文数据集从语言和应用角度研究了我们的模型。对于语言学，我们对词的类比和相似性进行实证研究。考虑到在文件分类和情感分析上学到的潜在表示，可以从这项工作的应用角度考虑。实验结果表明，与最新的单词嵌入模型（例如CBOW，SKIP-GRAM和GloVe）相比，我们的MWE模型在所有任务中都具有很高的竞争力。（C）2016 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2016年第15期|220-230|共11页
作者
Li Jianqiang; Li Jing; Fu Xianghua; Masud M. A.; Huang Joshua Zhexue;
展开▼
作者单位

Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China;

Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China;

Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China;

Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China;

Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Word embedding; Distributed word representation; Word2vec; Natural language processing;

机译：词嵌入;分布式词表示;Word2vec;自然语言处理;

相似文献

外文文献
中文文献
专利

1. A HYBRID WORD EMBEDDING MODEL BASED ON ADMIXTURE OF POISSON-GAMMA LATENT DIRICHLET ALLOCATION MODEL AND DISTRIBUTED WORD-DOCUMENT-TOPIC REPRESENTATION [J] . IBRAHIM BAKARI BALA, MOHD ZAINURI SARINGAT, AIDA MUSTAPHA Journal of Theoretical and Applied Information Technology . 2020,第9期

机译：一种基于泊松 - 伽马潜在Dirichlet分配模型和分布式字文档主题表示的混合词嵌入模型
2. Learning bag-of-embedded-words representations for textual information retrieval [J] . Passalis Nikolaos, Tefas Anastasios Pattern Recognition: The Journal of the Pattern Recognition Society . 2018,第期

机译：学习文本信息检索的嵌入文字表示
3. Beyond word embeddings: learning entity and concept representations from large scale knowledge bases [J] . Shalaby Walid, Zadrozny Wlodek, Jin Hongxia Information retrieval . 2019,第6期

机译：除了Word Embeddings：来自大规模知识库的学习实体和概念表示
4. Learning Better Embeddings for Rare Words Using Distributional Representations [C] . Irina Sergienya, Hinrich Schuetze Conference on empirical methods in natural language processing . 2015

机译：使用分布表示为稀有词学习更好的嵌入
5. Multi-Contextual Representation and Learning with Applications in Materials Knowledge Discovery. [D] . Liu, Ruoqian. 2016

机译：在材料知识发现中的多上下文表示和学习及其应用。
6. Incorporating Linguistic Knowledge for Learning Distributed Word Representations [O] . Yan Wang, Zhiyuan Liu, Maosong Sun -1

机译：结合语言知识来学习分布式单词表示
7. Learning Better Embeddings for Rare Words Using Distributional Representations [O] . Irina Sergienya 2015

机译：使用分布式表示学习更好的嵌入稀有词

Learning distributed word representation with multi-contextual mixed embedding

摘要

著录项

相似文献

相关主题

期刊订阅