首页> 外文会议>Machine translation summit >Improving Neural Machine Translation Using Noisy Parallel Data through Distillation

【24h】

Improving Neural Machine Translation Using Noisy Parallel Data through Distillation

机译：通过蒸馏使用嘈杂的并行数据改进神经机翻译

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the scarcity of parallel training data for many language pairs, quasi-parallel or comparable training data provides an important alternative resource for training machine translation systems for such language pairs. Since comparable corpora are not of as high quality as manually annotated parallel data, using them for training can have a negative effect on the translation performance of an NMT model. We propose distillation as a remedy to effectively leverage comparable data where the training of a student model on combined clean and comparable data is guided by a teacher model trained on the high-quality, clean data only. Our experiments for Arabic-English, Chinese-English, and German-English translation demonstrate that distillation yields significant improvements compared to off-the-shelf use of comparable data and performs comparable to state-of-the-art methods for noise filtering.

机译：由于许多语言对的并行训练数据的稀缺性，准平行或可比较的训练数据为这种语言对提供了一种重要的替代资源，用于培训机器翻译系统。由于可比的Corpora不像手动注释的并行数据的高品质，因此使用它们进行培训可能对NMT模型的翻译性能产生负面影响。我们提出蒸馏作为一个补救措施，以有效利用可比的数据，其中学生模型的组合清洁和可比数据的培训是由高质量，清洁数据训练的教师模型引导的。我们对阿拉伯语 - 英语，中文和德语 - 英语翻译的实验表明，与特征使用可比数据相比，蒸馏产生了显着的改善，并执行与最先进的噪声滤波方法相当的方法。

著录项

来源
《Machine translation summit》|2019年|281 p.|共10页
会议地点
作者
Praveen Dakwale; Christof Monz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 20:18:42

相似文献

外文文献
中文文献
专利

1. Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation [J] . Imankulova Aizhan, Sato Takayuki, Komachi Mamoru ACM transactions on Asian and low-resource language information processing . 2020,第2期

机译：过滤伪并行语料库可提高低资源神经电机翻译
2. A Hybrid Approach for Improved Low Resource Neural Machine Translation using Monolingual Data [J] . Idris Abdulmumin, Bashir Shehu Galadanci, Abubakar Isa, Engineering Letters . 2021,第4期

机译：一种使用单晶体数据改进低资源神经机平移的混合方法
3. Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus [J] . B.Premjith, M. AnandKumar, K.P.Soman Journal of Intelligent Systems . 2019,第3期

机译：使用MTILPLASSIGHS语料库的英语英语翻译中英语翻译系统
4. Improving Neural Machine Translation Using Noisy Parallel Data through Distillation [C] . Praveen Dakwale, Christof Monz Machine translation summit . 2019

机译：精练利用嘈杂的并行数据改善神经机器翻译
5. Improved Neural Machine Translation Systems for Low Resource Correction Tasks [D] . Harer, Jacob. 2019

机译：改进的神经电机翻译系统，用于低资源校正任务
6. Improving Neural Machine Translation by Filtering Synthetic Parallel Data [O] . Guanghao Xu, Youngjoong Ko, Jungyun Seo 2019

机译：通过过滤合成并行数据来改善神经电机转换
7. Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back-Translation [O] . Zhenhao Li, Lucia Specia 2019

机译：通过数据增强改进神经机翻译鲁棒性：超越后退翻译

Improving Neural Machine Translation Using Noisy Parallel Data through Distillation

摘要

著录项

相似文献

相关主题

期刊订阅