首页> 外文会议>Second conference on machine translation >Copied Monolingual Data Improves Low-Resource Neural Machine Translation
【24h】

Copied Monolingual Data Improves Low-Resource Neural Machine Translation

机译:复制的单语数据可改善低资源神经机器翻译

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

We train a neural machine translation (NMT) system to both translate source-language text and copy target-language text, thereby exploiting monolingual corpora in the target language. Specifically, we create a bitext from the monolingual text in the target language so that each source sentence is identical to the target sentence. This copied data is then mixed with the parallel corpus and the NMT system is trained like normal, with no metadata to distinguish the two input languages Our proposed method proves to be an effective way of incorporating monolingual data into low-resource NMT. On Turkish↔English and Romanian↔Enghsh translation tasks, we see gains of up to 1.2 BLEU over a strong baseline with back-translation. Further analysis shows that the linguistic phenomena behind these gains are different from and largely orthogonal to back-translation, with our copied corpus method improving accuracy on named entities and other words that should remain identical between the source and target languages.
机译:我们训练了一个神经机器翻译(NMT)系统来翻译源语言文本和复制目标语言文本,从而利用目标语言中的单语语料库。具体来说,我们使用目标语言的单语文本创建一个bitext,以便每个源句子与目标句子相同。然后,将这些复制的数据与并行语料库混合,并且像普通方法一样训练NMT系统,而没有元数据来区分两种输入语言。我们提出的方法被证明是将单语种数据合并到低资源NMT中的有效方法。在土耳其语-英语和罗马尼亚语-英语的翻译任务中,通过强大的反向翻译功能,我们可以获得高达1.2 BLEU的收益。进一步的分析表明,这些收益背后的语言现象与反向翻译不同,并且在很大程度上与反向翻译正交,我们的复制语料库方法提高了命名实体和在源语言和目标语言之间应保持相同的其他单词的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号