首页> 外文期刊>ACM transactions on Asian language information processing >Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages
【24h】

Multi-Round Transfer Learning for Low-Resource NMT Using Multiple High-Resource Languages

机译:使用多种高资源语言进行低资源NMT的多次回传学习

获取原文
获取原文并翻译 | 示例
           

摘要

Neural machine translation (NMT) has made remarkable progress in recent years, but the performance of NMT suffers from a data sparsity problem since large-scale parallel corpora are only readily available for high-resource languages (HRLs). In recent days, transfer learning (TL) has been used widely in low-resource languages (LRLs) machine translation, while TL is becoming one of the vital directions for addressing the data sparsity problem in low-resource NMT. As a solution, a transfer learning method in NMT is generally obtained via initializing the low-resource model (child) with the high-resource model (parent). However, leveraging the original TL to low-resource models is neither able to make full use of highly related multiple HRLs nor to receive different parameters from the same parents. In order to exploit multiple HRLs effectively, we present a language-independent and straightforward multi-round transfer learning (MRTL) approach to low-resource NMT. Besides, with the intention of reducing the differences between high-resource and low-resource languages at the character level, we introduce a unified transliteration method for various language families, which are both semantically and syntactically highly analogous with each other. Experiments on low-resource datasets show that our approaches are effective, significantly outperform the state-of-the-art methods, and yield improvements of up to 5.63 BLEU points.
机译:近年来,神经机器翻译(NMT)取得了显着进步,但是NMT的性能受到数据稀疏性的困扰,因为大规模并行语料库仅可用于高资源语言(HRL)。近年来,转移学习(TL)已在低资源语言(LRL)机器翻译中得到广泛使用,而TL正在成为解决低资源NMT中数据稀疏性问题的重要方向之一。作为解决方案,通常通过将低资源模型(子级)初始化为高资源模型(父级)来获得NMT中的转移学习方法。但是,将原始TL用于低资源模型既不能充分利用高度相关的多个HRL,也不能从同一父级接收不同的参数。为了有效地利用多个HRL,我们提出了一种语言独立且直接的多轮转学制(MRTL)方法来实现资源贫乏的NMT。此外,为了在字符级别上减少高资源语言和低资源语言之间的差异,我们针对各种语言族引入了统一的音译方法,这些族在语义和句法上都非常相似。在低资源数据集上进行的实验表明,我们的方法是有效的,显着优于最新方法,并且最多可提高5.63 BLEU点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号