首页> 外文会议>4th workshop on Asian translation >Improving Low-Resource Neural Machine Translation with Filtered Pseudo-parallel Corpus

【24h】

Improving Low-Resource Neural Machine Translation with Filtered Pseudo-parallel Corpus

机译：过滤后的伪并行语料库改善低资源神经机器翻译

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Large-scale parallel corpora are indispensable to train highly accurate machine translators. However, manually constructed large-scale parallel corpora are not freely available in many language pairs. In previous studies, training data have been expanded using a pseudo-parallel corpus obtained using machine translation of the monolingual corpus in the target language. However, in low-resource language pairs in which only low-accuracy machine translation systems can be used, translation quality is reduces when a pseudo-parallel' corpus is used naively. To improve machine translation performance with low-resource language pairs, we propose a method to expand the training data effectively via filtering the pseudo-parallel corpus using a quality estimation based on back-translation. As a result of experiments with three language pairs using small, medium, and large parallel corpora, language pairs with fewer training data filtered out more sentence pairs and improved BLEU scores more significantly.

机译：大型并行语料库对于训练高精度的机器翻译器是必不可少的。但是，手动构建的大型并行语料库在许多语言对中并不是免费提供的。在先前的研究中，使用伪平行语料库扩展了训练数据，该伪平行语料库是使用目标语言中的单语语料库进行机器翻译而获得的。但是，在只能使用低精度机器翻译系统的低资源语言对中，如果天真使用伪并行语料库，翻译质量会降低。为了提高低资源语言对的机器翻译性能，我们提出了一种方法，该方法通过使用基于反向翻译的质量估计对伪并行语料进行过滤来有效地扩展训练数据。作为使用小型，中型和大型并行语料库的三种语言对进行实验的结果，训练数据较少的语言对过滤掉了更多的句子对，并显着提高了BLEU分数。

著录项

来源
《4th workshop on Asian translation》|2017年|70-78|共9页
会议地点 Taipei(CN)
作者
Aizhan Imankulova; Takayuki Sato; Mamoru Komachi;
展开▼
作者单位

Tokyo Metropolitan University;

Tokyo Metropolitan University;

Tokyo Metropolitan University;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation [J] . Imankulova Aizhan, Sato Takayuki, Komachi Mamoru ACM transactions on Asian and low-resource language information processing . 2020,第2期

机译：过滤伪并行语料库可提高低资源神经电机翻译
2. Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation [J] . Michael Adjeisah, Guohua Liu, Douglas Omwenga Nyabuga, Computational intelligence and neuroscience . 2021,第a期

机译：神经电机翻译低资源语料的假义注射和预先滤波
3. Impact of Filtering Generated Pseudo Bilingual Texts in Low-Resource Neural Machine Translation Enhancement: The Case of Persian-Spanish [J] . Benyamin Ahmadnia, Bonnie J. Dorr, Raul Aranovich Procedia Computer Science . 2021,第a期

机译：滤波产生的伪双语文本在低资源神经机翻译增强中的影响：波斯语西班牙语的情况
4. Improving Low-Resource Neural Machine Translation with Filtered Pseudo-parallel Corpus [C] . Aizhan Imankulova, Takayuki Sato, Mamoru Komachi Workshop on Asian translation . 2017

机译：用过滤伪平行语料库改善低资源神经电机翻译
5. Non-Traditional Resources and Improved Tools for Low-Resource Machine Translation [D] . Pourdamghani, Nima. 2019

机译：非传统资源和低资源机器翻译的改进工具
6. Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation [O] . Michael Adjeisah, Guohua Liu, Douglas Omwenga Nyabuga, 2021

机译：神经电机翻译低资源语料的假义注射和预先滤波
7. Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation [O] . Aizhan Imankulova, Takayuki Sato, Mamoru Komachi 2020

机译：过滤伪并行语料库可提高低资源神经电机翻译

Improving Low-Resource Neural Machine Translation with Filtered Pseudo-parallel Corpus

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅