首页> 外文期刊>American Journal of Artificial Intelligence >Spanish-Turkish Low-Resource Machine Translation: Unsupervised Learning vs Round-Tripping
【24h】

Spanish-Turkish Low-Resource Machine Translation: Unsupervised Learning vs Round-Tripping

机译:西班牙语 - 土耳其低资源机器翻译:无监督的学习与圆绊倒

获取原文
           

摘要

The quality of data-driven Machine Translation (MT) strongly depends on the quantity as well as the quality of the training dataset. However, collecting a large set of training parallel texts is not easy in practice. Although various approaches have already been proposed to overcome this issue, the lack of large parallel corpora still poses a major practical problem for many language pairs. Since monolingual data plays an important role in boosting fluency for Neural MT (NMT) models, this paper investigates and compares the performance of two learning-based translation approaches for Spanish-Turkish translation as a low-resource setting in case we only have access to large sets of monolingual data in each language; 1) Unsupervised Learning approach, and 2) Round-Tripping approach. Either approach completely removes the need for bilingual data or enables us to train the NMT system relying on monolingual data only. We utilize an Attention-based NMT (Attentional NMT) model, which leverages a careful initialization of the parameters, the denoising effect of language models, and the automatic generation of bilingual data. Our experimental results demonstrate that the Unsupervised Learning approach outperforms the Round-Tripping approach in Spanish-Turkish translation and vice versa. These results confirm that the Unsupervised Learning approach is still a reliable learning-based translation technique for Spanish-Turkish low-resource NMT.
机译:数据驱动机器翻译(MT)的质量强烈取决于数量以及培训数据集的质量。但是,在实践中收集大量训练并行文本并不容易。虽然已经提出了各种方法来克服这个问题,但缺乏大型平行的Corpora仍然对许多语言对构成了一个重大的实际问题。由于单晶体数据在提高神经MT(NMT)模型的流畅性方面发挥着重要作用,因此本文调查并比较了西班牙语翻译的两种学习翻译方法的性能,以防我们只能访问每种语言的大型单声道数据; 1)无监督的学习方法和2)圆形绊倒方法。任何一种方法都完全删除了对双语数据的需求,或者使我们能够培训仅依赖于单机数据的NMT系统。我们利用基于注意的NMT(注意力NMT)模型,利用仔细初始化参数,语言模型的去噪,以及自动生成双语数据。我们的实验结果表明,无监督的学习方法优于西班牙语翻译中的往返方法,反之亦然。这些结果证实,无监督的学习方法仍然是西班牙语 - 土耳其低资源NMT的可靠学习的翻译技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号