Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages

Saurav Jha; Akhilesh Sudhakar; Anil Kumar Singh

首页> 外文期刊>Journal of Language Modelling >Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages

【24h】

Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages

机译：学习跨语言的语音和拼字法适应：改进低资源语言之间的神经机器翻译的案例研究

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Out-of-vocabulary (OOV) words can pose serious challenges for machine translation (MT) tasks, and in particular, for low-resource language (LRL) pairs, i.e., language pairs for which few or no parallel corpora exist. Our work adapts variants of seq2seq models to perform transduction of such words from Hindi to Bhojpuri (an LRL instance), learning from a set of cognate pairs built from a bilingual dictionary of Hindi - Bhojpuri words. We demonstrate that our models can be effectively used for language pairs that have limited parallel corpora; our models work at the character level to grasp phonetic and orthographic similarities across multiple types of word adaptations, whether synchronic or diachronic, loan words or cognates. We describe the training aspects of several character level NMT systems that we adapted to this task and characterize their typical errors. Our method improves BLEU score by 6.3 on the Hindi-to-Bhojpuri translation task. Further, we show that such transductions can generalize well to other languages by applying it successfully to Hindi - Bangla cognate pairs. Our work can be seen as an important step in the process of: (i) resolving the OOV words problem arising in MT tasks; (ii) creating effective parallel corpora for resource constrained languages; and (iii) leveraging the enhanced semantic knowledge captured by word-level embeddings to perform character-level tasks.

机译：词汇外（OOV）单词可能对机器翻译（MT）任务尤其是低资源语言（LRL）对（即，很少或没有并行语料库的语言对）构成严峻挑战。我们的工作改编了seq2seq模型的变体，以从印地语到Bhojpuri（一个LRL实例）进行此类单词的转换，并从根据印地语-Bhojpuri单词的双语词典构建的一组同源对中学习。我们证明了我们的模型可以有效地用于并行语料库有限的语言对。我们的模型在字符级别工作，以掌握多种类型的单词改编的语音和正字相似性，无论是共时的还是历时的，借来的单词或同源单词。我们描述了几种字符级NMT系统的训练方面，这些系统适合于此任务并描述了它们的典型错误。在Hindi-to-Bhojpuri翻译任务上，我们的方法将BLEU分数提高了6.3。此外，我们证明，通过成功地将其应用于印地语-孟加拉语同源对，这种转导可以很好地推广到其他语言。我们的工作可以看作是以下过程中的重要步骤：（i）解决MT任务中出现的OOV单词问题; （ii）为资源受限的语言创建有效的并行语料库; （iii）利用单词级嵌入捕获的增强语义知识来执行字符级任务。

著录项

来源
《Journal of Language Modelling》 |2019年第2期|共42页
作者
Saurav Jha; Akhilesh Sudhakar; Anil Kumar Singh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Neural machine translationHindiBhojpuriword transductionlow resource languageattention model;

机译：神经机器翻译印地文汉字转译低资源语言注意模型;
入库时间 2022-08-18 16:39:36

相似文献

外文文献
中文文献
专利

1. Neural machine translation of low-resource languages using SMT phrase pair injection [J] . Sukanta Sen, Mohammed Hasanuzzaman, Asif Ekbal, Natural language engineering . 2021,第Pta3期

机译：使用SMT短语对注射的低资源语言的神经机翻译
2. Extremely low-resource neural machine translation for Asian languages [J] . Rubino Raphael, Marie Benjamin, Dabre Raj, Machine translation . 2020,第4期

机译：极低资源的神经机用于亚洲语言翻译
3. Neural machine translation for low-resource languages without parallel corpora [J] . Alina Karakanta, Jon Dehdari, Josef van Genabith Machine translation . 2018,第1a2期

机译：无需并行语料库的低资源语言的神经机器翻译
4. Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation [C] . Toan Q. Nguyen, David Chiang International joint conference on natural language processing . 2017

机译：跨资源较少的相关语言进行学习转移以进行神经机器翻译
5. Turkic Interlingua: A Case Study of Machine Translation in Low-Resource Languages [D] . Mirzakhalov, Jamshidbek. 2021

机译：Turikic Interlingua：一种低资源语言机器翻译的案例研究
6. S136. CLASSIFYING SCHIZOPHRENIA USING PHONOLOGICAL SEMANTIC AND SYNTACTIC FEATURES OF LANGUAGE; A COMBINATORY MACHINE LEARNING APPROACH [O] . Alban Voppel, Janna de Boer, Fleur Slegers, 2020

机译：S136。使用语言的语音语义和句法特征对精神分裂症进行分类；组合式机器学习方法
7. Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages [O] . Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh 2019

机译：学习交叉语音语音和矫形矫正适应性：在改进低资源语言中神经机翻译的案例研究

Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages

摘要

著录项

相似文献

相关主题

期刊订阅