Copied Monolingual Data Improves Low-Resource Neural Machine Translation

机译：复制的单语数据可改善低资源神经机器翻译

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We train a neural machine translation (NMT) system to both translate source-language text and copy target-language text, thereby exploiting monolingual corpora in the target language. Specifically, we create a bitext from the monolingual text in the target language so that each source sentence is identical to the target sentence. This copied data is then mixed with the parallel corpus and the NMT system is trained like normal, with no metadata to distinguish the two input languages Our proposed method proves to be an effective way of incorporating monolingual data into low-resource NMT. On Turkish↔English and Romanian↔Enghsh translation tasks, we see gains of up to 1.2 BLEU over a strong baseline with back-translation. Further analysis shows that the linguistic phenomena behind these gains are different from and largely orthogonal to back-translation, with our copied corpus method improving accuracy on named entities and other words that should remain identical between the source and target languages.

机译：我们训练了一个神经机器翻译（NMT）系统来翻译源语言文本和复制目标语言文本，从而利用目标语言中的单语语料库。具体来说，我们使用目标语言的单语文本创建一个bitext，以便每个源句子与目标句子相同。然后，将这些复制的数据与并行语料库混合，并且像普通方法一样训练NMT系统，而没有元数据来区分两种输入语言。我们提出的方法被证明是将单语种数据合并到低资源NMT中的有效方法。在土耳其语-英语和罗马尼亚语-英语的翻译任务中，通过强大的反向翻译功能，我们可以获得高达1.2 BLEU的收益。进一步的分析表明，这些收益背后的语言现象与反向翻译不同，并且在很大程度上与反向翻译正交，我们的复制语料库方法提高了命名实体和在源语言和目标语言之间应保持相同的其他单词的准确性。

著录项

来源
《Second conference on machine translation》|2017年|148-156|共9页
会议地点 Copenhagen(DK)
作者
Anna Currey; Antonio Valerio Miceli Barone; Kenneth Heafield;
展开▼
作者单位

School of Informatics, University of Edinburgh;

School of Informatics, University of Edinburgh;

School of Informatics, University of Edinburgh;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation [J] . Marie Benjamin, Fujita Atsushi ACM transactions on Asian language information processing . 2018,第3期

机译：使用单语数据进行短语表归纳以进行低资源统计机器翻译
2. A Hybrid Approach for Improved Low Resource Neural Machine Translation using Monolingual Data [J] . Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa, Engineering Letters . 2021,第4期

机译：一种使用单晶体数据改进低资源神经机平移的混合方法
3. Effectively training neural machine translation models with monolingual data [J] . Yang Zhen, Chen Wei, Wang Feng, Neurocomputing . 2019,第MARa14期

机译：用单语数据有效地训练神经机器翻译模型
4. Copied Monolingual Data Improves Low-Resource Neural Machine Translation [C] . Anna Currey, Antonio Valerio Miceli Barone, Kenneth Heafield Conference on machine translation . 2017

机译：复制的单声道数据可以改善低资源神经机翻译
5. Non-Traditional Resources and Improved Tools for Low-Resource Machine Translation [D] . Pourdamghani, Nima. 2019

机译：非传统资源和低资源机器翻译的改进工具
6. Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation [O] . Michael Adjeisah, Guohua Liu, Douglas Omwenga Nyabuga, 2021

机译：神经电机翻译低资源语料的假义注射和预先滤波
7. Improving Neural Machine Translation Models with Monolingual Data [O] . Sennrich, Rico, Haddow, Barry, Birch, Alexandra 2016

机译：用单语数据改进神经机器翻译模型

Copied Monolingual Data Improves Low-Resource Neural Machine Translation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅