...
首页> 外文期刊>Knowledge-Based Systems >Learning distributed word representation with multi-contextual mixed embedding
【24h】

Learning distributed word representation with multi-contextual mixed embedding

机译:通过多上下文混合嵌入学习分布式单词表示

获取原文
获取原文并翻译 | 示例
           

摘要

Learning distributed word representations has been a popular method for various natural language processing applications such as word analogy and similarity, document classification and sentiment analysis. However, most existing word embedding models only exploit a shallow slide window as the context to predict the target word. Because the semantic of each word is also influenced by its global context, as the distributional models usually induced the. word representations from the global co-occurrence matrix, the window-based models are insufficient to capture semantic knowledge. In this paper, we propose a novel hybrid model called mixed word embedding (MWE) based on the well-known word2vec toolbox. Specifically, the proposed MWE model combines the two variants of word2vec, i.e., SKIP-GRAM and CBOW, in a seamless way via sharing a common encoding structure, which is able to capture the syntax information of words more accurately. Furthermore, it incorporates a global text vector into the CBOW variant so as to capture more semantic information. Our MWE preserves the same time complexity as the SKIP-GRAM. To evaluate our MWE model efficiently and adaptively, we study our model on linguistic and application perspectives with both English and Chinese dataset. For linguistics, we conduct empirical studies on word analogies and similarities. The learned latent representations on both document classification and sentiment analysis are considered for application point of view of this work. The experimental results show that our MWE model is very competitive in all tasks as compared with the state-of-the-art word embedding models such as CBOW, SKIP-GRAM, and GloVe. (C) 2016 Elsevier B.V. All rights reserved.
机译:对于各种自然语言处理应用程序,例如单词类比和相似度,文档分类和情感分析,学习分布式单词表示法已成为一种流行的方法。但是,大多数现有的词嵌入模型仅利用浅滑动窗口作为上下文来预测目标词。由于每个单词的语义也受其全局上下文的影响,因此通常会使用分布模型来诱导单词的语义。基于全局共现矩阵的单词表示,基于窗口的模型不足以捕获语义知识。在本文中,我们基于著名的word2vec工具箱,提出了一种称为混合词嵌入(MWE)的新颖混合模型。具体地,所提出的MWE模型通过共享通用的编码结构以无缝的方式组合了word2vec的两个变体,即,SKIP-GRAM和CBOW,其能够更准确地捕获单词的语法信息。此外,它将全局文本向量合并到CBOW变体中,以便捕获更多语义信息。我们的MWE保留了与SKIP-GRAM相同的时间复杂度。为了有效,自适应地评估我们的MWE模型,我们使用英语和中文数据集从语言和应用角度研究了我们的模型。对于语言学,我们对词的类比和相似性进行实证研究。考虑到在文件分类和情感分析上学到的潜在表示,可以从这项工作的应用角度考虑。实验结果表明,与最新的单词嵌入模型(例如CBOW,SKIP-GRAM和GloVe)相比,我们的MWE模型在所有任务中都具有很高的竞争力。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号