首页> 外文OA文献 >Bilingual distributed word representations from document-aligned comparable data
【2h】

Bilingual distributed word representations from document-aligned comparable data

机译:来自与文档对齐的可比数据的双语分布式单词表示

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs which heavily relied on parallel sentence-aligned corpora and/or readily available translation resources such as dictionaries, the article reveals that BWEs may be learned solely on the basis of document-aligned comparable data without any additional lexical resources nor syntactic information. We present a comparison of our approach with previous state-of-the-art models for learning bilingual word representations from comparable data that rely on the framework of multilingual probabilistic topic modeling (MuPTM), as well as with distributional local context-counting models. We demonstrate the utility of the induced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2) suggesting word translations in context for polysemous words. Our simple yet effective BWE-based models significantly outperform the MuPTM-based and contextcounting representation models from comparable data as well as prior BWE-based models, and acquire the best reported results on both tasks for all three tested language pairs.
机译:我们提出了一种新模型,用于从非并行文档对齐数据中学习双语单词表示。随着单词表示学习的最新进展,我们的模型学习了密集的实值单词向量,即双语单词嵌入(BWE)。与以前的诱导BWE的工作完全依靠平行的句子对齐的语料库和/或字典等易于获得的翻译资源不同,本文揭示了BWE可以仅根据与文档对齐的可比数据来学习,而无需任何额外的词汇资源或句法信息。我们将我们的方法与以前的先进模型进行比较,该模型用于从可比数据中学习双语单词表示,该可比数据依赖于多语言概率主题建模(MuPTM)的框架,以及分布式本地上下文计数模型。我们证明了诱导的BWE在两个语义任务中的效用:(1)双语词典提取,(2)在上下文中为多义单词建议单词翻译。我们的简单但有效的基于BWE的模型在可比数据以及以前的基于BWE的模型上显着优于基于MuPTM和上下文计数的表示模型,并且在这三种测试语言对的两项任务中均获得了最佳的报告结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号