首页> 外文会议>4th workshop on Asian translation >Improving Low-Resource Neural Machine Translation with Filtered Pseudo-parallel Corpus
【24h】

Improving Low-Resource Neural Machine Translation with Filtered Pseudo-parallel Corpus

机译:过滤后的伪并行语料库改善低资源神经机器翻译

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Large-scale parallel corpora are indispensable to train highly accurate machine translators. However, manually constructed large-scale parallel corpora are not freely available in many language pairs. In previous studies, training data have been expanded using a pseudo-parallel corpus obtained using machine translation of the monolingual corpus in the target language. However, in low-resource language pairs in which only low-accuracy machine translation systems can be used, translation quality is reduces when a pseudo-parallel' corpus is used naively. To improve machine translation performance with low-resource language pairs, we propose a method to expand the training data effectively via filtering the pseudo-parallel corpus using a quality estimation based on back-translation. As a result of experiments with three language pairs using small, medium, and large parallel corpora, language pairs with fewer training data filtered out more sentence pairs and improved BLEU scores more significantly.
机译:大型并行语料库对于训练高精度的机器翻译器是必不可少的。但是,手动构建的大型并行语料库在许多语言对中并不是免费提供的。在先前的研究中,使用伪平行语料库扩展了训练数据,该伪平行语料库是使用目标语言中的单语语料库进行机器翻译而获得的。但是,在只能使用低精度机器翻译系统的低资源语言对中,如果天真使用伪并行语料库,翻译质量会降低。为了提高低资源语言对的机器翻译性能,我们提出了一种方法,该方法通过使用基于反向翻译的质量估计对伪并行语料进行过滤来有效地扩展训练数据。作为使用小型,中型和大型并行语料库的三种语言对进行实验的结果,训练数据较少的语言对过滤掉了更多的句子对,并显着提高了BLEU分数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号