首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Improving Neural Machine Translation Models with Monolingual Data
【24h】

Improving Neural Machine Translation Models with Monolingual Data

机译:用单语数据改进神经机器翻译模型

获取原文

摘要

Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Target-side monolingual data plays an important role in boosting fluency for phrase-based statistical machine translation, and we investigate the use of monolingual data for NMT. In contrast to previous work, which combines NMT models with separately trained language models, we note that encoder-decoder NMT architectures already have the capacity to learn the same information as a language model, and we explore strategies to train with monolingual data without changing the neural network architecture. By pairing monolingual training data with an automatic back-translation, we can treat it as additional parallel training data, and we obtain substantial improvements on the WMT 15 task English(←→)German (+2.8-3.7 Bleu), and for the low-resourced IWSLT 14 task Turkish(←→)English (+2.1-3.4 Bleu), obtaining new state-of-the-art results. We also show that fine-tuning on in-domain monolingual and parallel data gives substantial improvements for the IWSLT 15 task English→German.
机译:神经机器翻译(NMT)在使用几种并行数据进行训练的同时,已经获得了几种语言对的最新性能。目标方单语数据在提高基于短语的统计机器翻译的流利性方面起着重要作用,我们研究了将单语数据用于NMT的情况。与将NMT模型与经过单独训练的语言模型相结合的先前工作相比,我们注意到编码器/解码器NMT体系结构已经具有学习与语言模型相同的信息的能力,并且我们探索了在不更改语言的情况下进行单语言数据训练的策略。神经网络架构。通过将单语种培训数据与自动反向翻译配对,我们可以将其视为附加的并行培训数据,并且在WMT 15任务英语(←→)德语(+ 2.8-3.7 Bleu)上获得了实质性的改进,而对于资源的IWSLT 14任务土耳其语(←→)英语(+ 2.1-3.4 Bleu),获得了最新的最新结果。我们还显示,对域内单语和并行数据进行的微调对IWSLT 15任务English→German做出了重大改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号