Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study

Sun Haipeng; Wang Rui; Utiyama Masao; Marie Benjamin; Chen Kehai; Sumita Eiichiro; Zhao Tiejun

首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study

【24h】

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study

机译：针对类似和遥远语言对的无监督神经机翻译：实证研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unsupervised neural machine translation (UNMT) has achieved remarkable results for several language pairs, such as French-English and German-English. Most previous studies have focused on modeling UNMT systems; few studies have investigated the effect of UNMT on specific languages. In this article, we first empirically investigate UNMT for four diverse language pairs (French/German/Chinese/Japanese-English). We confirm that the performance of UNMT in translation tasks for similar language pairs (French/German-English) is dramatically better than for distant language pairs (Chinese/Japanese-English). We empirically show that the lack of shared words and different word orderings are the main reasons that lead UNMT to underperform in Chinese/Japanese-English. Based on these findings, we propose several methods, including artificial shared words and pre-ordering, to improve the performance of UNMT for distant language pairs. Moreover, we propose a simple general method to improve translation performance for all these four language pairs. The existing UNMT model can generate a translation of a reasonable quality after a few training epochs owing to a denoising mechanism and shared latent representations. However, learning shared latent representations restricts the performance of translation in both directions, particularly for distant language pairs, while denoising dramatically delays convergence by continuously modifying the training data. To avoid these problems, we propose a simple, yet effective and efficient, approach that (like UNMT) relies solely on monolingual corpora: pseudo-data-based unsupervised neural machine translation. Experimental results for these four language pairs show that our proposed methods significantly outperform UNMT baselines.

机译：无监督的神经电脑翻译（UNOMT）对几种语言对的结果取得了显着的结果，例如法语 - 英语和德语 - 英语。以前的大多数研究都集中在UNMT系统建模;少数研究已经调查了联索特对特定语言的影响。在本文中，我们首先经验证明UPORT为四个不同的语言对（法语/德语/中文/日语）。我们确认联索特尔在翻译任务中的类似语言对（法语/德语 - 英语）的表现比远程语言对（中文/日语）更好地显着。我们经验证明，缺乏共同词汇和不同的单词排序是导致未介绍中文/日语 - 英语表现不佳的主要原因。基于这些调查结果，我们提出了几种方法，包括人工共享词和预先订购，以改善联索特特对遥远的语言对的表现。此外，我们提出了一种简单的一般方法，可以提高所有这四种语言对的平移性能。由于去噪机制和共享潜在表示，现有的UNOM特模型可以在几次培训时期后产生合理质量的翻译。然而，学习共享潜在的表示在两个方向上限制了翻译的性能，特别是对于遥远的语言对，而越野通过不断修改训练数据而显着延迟会聚。为避免这些问题，我们提出了一种简单，但像联索特（离子）的简单，有效且有效的方法，仅依赖于单梅换语料库：基于伪数据的无监督的神经机翻译。这四种语言对的实验结果表明我们所提出的方法显着优于联波基底。

著录项

来源
《ACM transactions on Asian and low-resource language information processing》 |2021年第1期|10.1-10.17|共17页
作者
Sun Haipeng; Wang Rui; Utiyama Masao; Marie Benjamin; Chen Kehai; Sumita Eiichiro; Zhao Tiejun;
展开▼
作者单位

Harbin Inst Technol 92 West Dazhi St Harbin 150001 Heilongjiang Peoples R China;

Natl Inst Informat & Commun Technol 3-5 Hikaridai Seika Kyoto 6190289 Japan;

Natl Inst Informat & Commun Technol 3-5 Hikaridai Seika Kyoto 6190289 Japan;

Natl Inst Informat & Commun Technol 3-5 Hikaridai Seika Kyoto 6190289 Japan;

Natl Inst Informat & Commun Technol 3-5 Hikaridai Seika Kyoto 6190289 Japan;

Natl Inst Informat & Commun Technol 3-5 Hikaridai Seika Kyoto 6190289 Japan;

Harbin Inst Technol 92 West Dazhi St Harbin 150001 Heilongjiang Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Unsupervised neural machine translation; pseudo-data-based unsupervised neural machine translation; similar and distant language pairs;

机译：无监督的神经机翻译;基于伪数据无监督的神经机翻译;类似和遥远的语言对;

相似文献

外文文献
中文文献
专利

1. Neural machine translation of low-resource languages using SMT phrase pair injection [J] . Sukanta Sen, Mohammed Hasanuzzaman, Asif Ekbal, Natural language engineering . 2021,第Pta3期

机译：使用SMT短语对注射的低资源语言的神经机翻译
2. Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages [J] . Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh Journal of Language Modelling . 2019,第2期

机译：学习跨语言的语音和拼字法适应：改进低资源语言之间的神经机器翻译的案例研究
3. Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation [J] . Dan Tufis Computer science journal of Moldova . 2012,第2期

机译：查找资源不足的语言对或狭窄域的翻译示例；机器翻译案例
4. Unsupervised Pivot Translation for Distant Languages [C] . Yichong Leng, Xu Tan, Tao Qin, Annual meeting of the Association for Computational Linguistics . 2019

机译：远程语言的无监督数据透视翻译
5. A Crowd-Powered Conversational Assistant for the Improvement of a Neural Machine Translation System in Native Peruvian Language [D] . Gómez Montoya, Héctor Erasmo. 2019

机译：一种人群，用于改进本土秘密语言中神经机翻译系统的人群对话助理
6. Neural machine translation of clinical texts between long distance languages [O] . Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, 2019

机译：长途语言之间临床文本的神经机翻译
7. Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation [O] . Alexandra Chronopoulou, Dario Stojanovski, Alexander Fraser 2021

机译：提高预测神经电机翻译的预训型语言模型的词汇能力

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study

摘要

著录项

相似文献

相关主题

期刊订阅