Data Augmentation using Back-translation for Context-aware Neural Machine Translation

机译：使用回译进行上下文感知的神经机器翻译的数据增强

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

a single sentence does not always convey information required to translate it into other languages: we sometimes need to add or specialize words that are omitted or ambiguous in the source languages (e.g.. zero pronouns in translating Japanese to English or epicene pronouns in translating English to French). To translate such ambiguous sentences, we exploit contexts around the source sentence, and have so far explored context-aware neural machine translation (NMT). However, a large amount of parallel corpora is not easily available to train accurate context-aware NMT models. In this study, we first obtain large-scale pseudo parallel corpora by back-translating target-side monolingual corpora, and then investigate its impact on the translation performance of context-aware NMT models. We evaluate NMT models trained with small parallel corpora and the large-scale pseudo parallel corpora on IWSLT2017 English-Japanese and English-French datasets, and demonstrate the large impact of the data augmentation for context-aware NMT models in terms of bleu score and specialized test sets on ja→en~1 and fr→en.

机译：单个句子并不总是传达将其翻译成其他语言所需的信息：我们有时需要添加或专门化源语言中被省略或不明确的单词（例如，将日语翻译为英语时使用零代词，或将英语翻译为英语时使用世俗代词）法语）。为了翻译这样的歧义句子，我们利用源句子周围的上下文，并且到目前为止已经探索了上下文感知的神经机器翻译（NMT）。然而，大量的并行语料库不易用于训练准确的上下文感知NMT模型。在这项研究中，我们首先通过对目标侧单语语料库进行反向翻译来获得大型伪并行语料库，然后研究其对上下文感知NMT模型的翻译性能的影响。我们评估了在IWSLT2017英日文和英法文数据集上由小型并行语料库和大规模伪并行语料库训练的NMT模型，并证明了数据增强对上下文感知NMT模型的青斑得分和专业性的巨大影响ja→en〜1和fr→en上的测试集。

著录项

来源
《Workshop on discourse in machine translation》|2019年|35-44|共10页
会议地点
作者
Amane Sugiyama; Naoki Yoshinaga;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Joint Back-Translation and Transfer Learning Method for Low-Resource Neural Machine Translation [J] . Gong-Xu Luo, Ya-Ting Yang, Rui Dong, Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：低资源神经电机翻译的联合背翻译与转移学习方法
2. A Context-Aware Recurrent Encoder for Neural Machine Translation [J] . Biao Zhang, Deyi Xiong, Jinsong Su, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第12期

机译：用于神经机器翻译的上下文感知循环编码器
3. Corpus Augmentation for Improving Neural Machine Translation [J] . Zijian Li, Chengying Chi, Yunyun Zhan Computers, Materials & Continua . 2020,第1期

机译：改善神经电脑翻译的语料库增强
4. Data Augmentation using Back-translation for Context-aware Neural Machine Translation [C] . Amane Sugiyama, Naoki Yoshinaga Workshop on discourse in machine translation . 2019

机译：数据增强使用后翻版内容感知神经机翻译
5. Neural network based classification of bearing faults in rotating machines: Augmentation of vibration measurements with power measurements. [D] . Koosial, Jainarine. 2006

机译：基于神经网络的旋转机械轴承故障分类：振动测量与功率测量的增强。
6. A Chaotic Neural Network Model for English Machine Translation Based on Big Data Analysis [O] . Qianyu Cao, Hanmei Hao 2021

机译：基于大数据分析的英式电机翻译混沌神经网络模型
7. Data augmentation using back-translation for context-aware neural machine translation [O] . Amane Sugiyama, Naoki Yoshinaga 2019

机译：数据增强使用后翻版内容感知神经机翻译
8. Machine Translation Based Data Augmentation for Cantonese Keyword Spotting (Author's Manuscript). [R] . Huang, G., Gorin, A., Gauvain, J., 2016

机译：基于机器翻译的粤语关键词识别数据增强（作者手稿）。

Data Augmentation using Back-translation for Context-aware Neural Machine Translation

摘要

著录项

相似文献

相关主题

期刊订阅