首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation
【24h】

Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation

机译:过滤伪并行语料库可提高低资源神经电机翻译

获取原文
获取原文并翻译 | 示例

摘要

Large-scale parallel corpora are essential for training high-quality machine translation systems; however, such corpora are not freely available for many language translation pairs. Previously, training data has been augmented by pseudo-parallel corpora obtained by using machine translation models to translate monolingual corpora into the source language. However, in low-resource language pairs, in which only low-accurate machine translation systems can be used, translation quality degrades when a pseudo-parallel corpus is naively used. To improve machine translation performance with low-resource language pairs, we propose a method to effectively expand the training data via filtering the pseudo-parallel corpus using quality estimation based on sentence-level round-trip translation. For experiments with three language pairs that utilized small, medium, and large size parallel corpora, BLEU scores significantly improved for low-resource language pairs. Additionally, the effects of iterative bootstrapping on translation performance quality is investigated; resultingly, it is confirmed that bootstrapping can further improve the translation performance.
机译:大规模并行对培训高质量机器翻译系统至关重要;但是,这种Corpora不会免费获得许多语言翻译对。以前,培训数据通过使用机器翻译模型将单格式语料转换为源语言而获得的伪平行的语料库。然而,在低资源语言对中,只有在其中可以使用低准确的机器翻译系统,当天然使用伪并行的语料库时,转换质量劣化。为了通过低资源语言对提高机器翻译性能,我们提出了一种方法,通过基于句子级往返翻译的质量估计来过滤伪并行语料库来有效地扩展训练数据。对于使用小型,中等和大尺寸平行语料的三种语言对的实验,对于低资源语言对,BLEU分数显着提高。此外,研究了迭代自动启动对翻译性能质量的影响;由此化,确认自动启动可以进一步提高平移性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号