首页> 外文OA文献 >Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages
【2h】

Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages

机译:学习交叉语音语音和矫形矫正适应性:在改进低资源语言中神经机翻译的案例研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Out-of-vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for low-resource language (LRL) pairs, i.e., language pairs for which few or no parallel corpora exist. Our work adapts variants of seq2seq models to perform transduction of such words from Hindi to Bhojpuri (an LRL instance), learning from a set of cognate pairs built from a bilingual dictionary of Hindi - Bhojpuri words. We demonstrate that our models can be effectively used for language pairs that have limited parallel corpora; our models work at the character level to grasp phonetic and orthographic similarities across multiple types of word adaptations, whether synchronic or diachronic, loan words or cognates. We describe the training aspects of several character level NMT systems that we adapted to this task and characterize their typical errors. Our method improves BLEU score by 6.3 on the Hindi-to-Bhojpuri translation task. Further, we show that such transductions can generalize well to other languages by applying it successfully to Hindi - Bangla cognate pairs. Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks; (ii) creating effective parallel corpora for resource constrained languages; and (iii) leveraging the enhanced semantic knowledge captured by word-level embeddings to perform character-level tasks.
机译:失败词汇(OOV)单词可以对机器翻译(MT)任务构成严重挑战,特别是对于低资源语言(LRL)对,即,少数或根本没有平行语言的语言对。我们的工作适应了SEQ2SEQ模型的变体,从内部到Bhojpuri(LRL实例)进行转换,从一套由后方 - Bhojpuri字的双语词典建造的同源对。我们展示了我们的模型可以有效地用于平行有限的语言对;我们的模型在字符级别工作,掌握了多种类型的单词适应的语音和正交相似度,无论是同步还是历时,贷款词或同源。我们描述了我们适用于此任务的几个字符级别NMT系统的培训方面,并表征其典型错误。我们的方法通过6.3在后来的Bhojpuri翻译任务提高了Bleu得分。此外,我们表明这种转换可以通过成功应用于Hindi - Bangla同源对来概括到其他语言。我们的工作可以作为过程中的一个重要步骤:(i)解决oov单词在Mt任务中产生的问题; (ii)为资源受限语言创建有效的平行语料; (iii)利用单词级嵌入捕获的增强语义知识来执行字符级任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号