首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study
【24h】

Unsupervised Neural Machine Translation for Similar and Distant Language Pairs: An Empirical Study

机译:针对类似和遥远语言对的无监督神经机翻译:实证研究

获取原文
获取原文并翻译 | 示例
       

摘要

Unsupervised neural machine translation (UNMT) has achieved remarkable results for several language pairs, such as French-English and German-English. Most previous studies have focused on modeling UNMT systems; few studies have investigated the effect of UNMT on specific languages. In this article, we first empirically investigate UNMT for four diverse language pairs (French/German/Chinese/Japanese-English). We confirm that the performance of UNMT in translation tasks for similar language pairs (French/German-English) is dramatically better than for distant language pairs (Chinese/Japanese-English). We empirically show that the lack of shared words and different word orderings are the main reasons that lead UNMT to underperform in Chinese/Japanese-English. Based on these findings, we propose several methods, including artificial shared words and pre-ordering, to improve the performance of UNMT for distant language pairs. Moreover, we propose a simple general method to improve translation performance for all these four language pairs. The existing UNMT model can generate a translation of a reasonable quality after a few training epochs owing to a denoising mechanism and shared latent representations. However, learning shared latent representations restricts the performance of translation in both directions, particularly for distant language pairs, while denoising dramatically delays convergence by continuously modifying the training data. To avoid these problems, we propose a simple, yet effective and efficient, approach that (like UNMT) relies solely on monolingual corpora: pseudo-data-based unsupervised neural machine translation. Experimental results for these four language pairs show that our proposed methods significantly outperform UNMT baselines.
机译:无监督的神经电脑翻译(UNOMT)对几种语言对的结果取得了显着的结果,例如法语 - 英语和德语 - 英语。以前的大多数研究都集中在UNMT系统建模;少数研究已经调查了联索特对特定语言的影响。在本文中,我们首先经验证明UPORT为四个不同的语言对(法语/德语/中文/日语)。我们确认联索特尔在翻译任务中的类似语言对(法语/德语 - 英语)的表现比远程语言对(中文/日语)更好地显着。我们经验证明,缺乏共同词汇和不同的单词排序是导致未介绍中文/日语 - 英语表现不佳的主要原因。基于这些调查结果,我们提出了几种方法,包括人工共享词和预先订购,以改善联索特特对遥远的语言对的表现。此外,我们提出了一种简单的一般方法,可以提高所有这四种语言对的平移性能。由于去噪机制和共享潜在表示,现有的UNOM特模型可以在几次培训时期后产生合理质量的翻译。然而,学习共享潜在的表示在两个方向上限制了翻译的性能,特别是对于遥远的语言对,而越野通过不断修改训练数据而显着延迟会聚。为避免这些问题,我们提出了一种简单,但像联索特(离子)的简单,有效且有效的方法,仅依赖于单梅换语料库:基于伪数据的无监督的神经机翻译。这四种语言对的实验结果表明我们所提出的方法显着优于联波基底。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号