首页> 外文期刊>Computer speech and language >Context-dependent word representation for neural machine translation
【24h】

Context-dependent word representation for neural machine translation

机译:神经机器翻译的上下文相关词表示

获取原文
获取原文并翻译 | 示例
           

摘要

We first observe a potential weakness of continuous vector representations of symbols in neural machine translation. That is, the continuous vector representation, or a word embedding vector, of a symbol encodes multiple dimensions of similarity, equivalent to encoding more than one meaning of the word. This has the consequence that the encoder and decoder recurrent networks in neural machine translation need to spend substantial amount of their capacity in disambiguating source and target words based on the context which is defined by a source sentence. Based on this observation, in this paper we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence. Additionally, we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors. The experiments on En-Fr and En-De reveal that the proposed approaches of contextualization and symbolization improves the translation quality of neural machine translation systems significantly.
机译:我们首先观察到神经机器翻译中符号的连续矢量表示的潜在弱点。即,符号的连续向量表示或词嵌入向量对相似性的多个维度进行编码,等效于对词的一种以上含义进行编码。结果是,神经机器翻译中的编码器和解码器循环网络需要花费大量的能力来根据由源语句定义的上下文对源词和目标词进行歧义消除。基于这种观察,本文提出使用源句的非线性词袋表示对词嵌入向量进行上下文化。另外,我们建议用类型化的符号来表示特殊标记(例如数字,专有名词和缩写),以方便翻译不适合通过连续向量进行翻译的单词。在En-Fr和En-De上进行的实验表明,所提出的上下文化和符号化方法显着提高了神经机器翻译系统的翻译质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号