首页> 外文会议>Workshop on discourse in machine translation >Data Augmentation using Back-translation for Context-aware Neural Machine Translation
【24h】

Data Augmentation using Back-translation for Context-aware Neural Machine Translation

机译:使用回译进行上下文感知的神经机器翻译的数据增强

获取原文

摘要

a single sentence does not always convey information required to translate it into other languages: we sometimes need to add or specialize words that are omitted or ambiguous in the source languages (e.g.. zero pronouns in translating Japanese to English or epicene pronouns in translating English to French). To translate such ambiguous sentences, we exploit contexts around the source sentence, and have so far explored context-aware neural machine translation (NMT). However, a large amount of parallel corpora is not easily available to train accurate context-aware NMT models. In this study, we first obtain large-scale pseudo parallel corpora by back-translating target-side monolingual corpora, and then investigate its impact on the translation performance of context-aware NMT models. We evaluate NMT models trained with small parallel corpora and the large-scale pseudo parallel corpora on IWSLT2017 English-Japanese and English-French datasets, and demonstrate the large impact of the data augmentation for context-aware NMT models in terms of bleu score and specialized test sets on ja→en~1 and fr→en.
机译:单个句子并不总是传达将其翻译成其他语言所需的信息:我们有时需要添加或专门化源语言中被省略或不明确的单词(例如,将日语翻译为英语时使用零代词,或将英语翻译为英语时使用世俗代词)法语)。为了翻译这样的歧义句子,我们利用源句子周围的上下文,并且到目前为止已经探索了上下文感知的神经机器翻译(NMT)。然而,大量的并行语料库不易用于训练准确的上下文感知NMT模型。在这项研究中,我们首先通过对目标侧单语语料库进行反向翻译来获得大型伪并行语料库,然后研究其对上下文感知NMT模型的翻译性能的影响。我们评估了在IWSLT2017英日文和英法文数据集上由小型并行语料库和大规模伪并行语料库训练的NMT模型,并证明了数据增强对上下文感知NMT模型的青斑得分和专业性的巨大影响ja→en〜1和fr→en上的测试集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号