首页> 外文会议>Conference on empirical methods in natural language processing >Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation
【24h】

Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation

机译:神经机器翻译中针对难词的逆向翻译采样

获取原文

摘要

Neural Machine Translation has achieved state-of-the-art performance for several language pairs using a combination of parallel and synthetic data. Synthetic data is often generated by back-translating sentences randomly sampled from monolingual data using a reverse translation model. While back-translation has been shown to be very effective in many cases, it is not entirely clear why. In this work, we explore different aspects of back-translation, and show that words with high prediction loss during training benefit most from the addition of synthetic data. We introduce several variations of sampling strategies targeting difficult-to-predict words using prediction losses and frequencies of words. In addition, we also target the contexts of difficult words and sample sentences that are similar in context. Experimental results for the WMT news translation task show that our method improves translation quality by up to 1.7 and 1.2 BLEU points over back-translation using random sampling for German→English and English→German, respectively.
机译:神经电机翻译已经实现了使用并行和合成数据的组合的多种语言对的最先进的性能。综合数据通常由使用反向翻译模型随机地从单机数据中随机采样的后转换句子生成。虽然在许多情况下,后面翻译是非常有效的,但它并不完全清楚原因。在这项工作中,我们探讨了后退翻译的不同方面,并显示了在培训期间具有高预测损失的单词,从添加合成数据中受益匪浅。我们使用预测损失和单词频率介绍难以预测的单词的采样策略的几种变化。此外,我们还针对上下文中类似的单词和样本句子的上下文。 WMT新闻翻译任务的实验结果表明,我们的方法通过德语→英语和英文→德语→德语的随机抽样,通过随机抽样将翻译质量提高到1.7和1.2 BLEU积分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号