首页> 外文期刊>Computational intelligence and neuroscience >Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision
【24h】

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

机译:在最少的监督下获得低资源语言对中的平行句子

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.
机译:机器翻译依赖于平行句子,其数量是影响机器翻译系统性能的重要因素,尤其是在资源匮乏的语言中。最近通过机器学习从非并行数据中学习跨语言单词表示的进展为在低资源语言中以最少的监督获得双语句子提供了新的可能性。在本文中,我们引入了一种新的方法,仅通过大约数百个条目的小尺寸双语种子词典来获得平行句子。我们首先通过种子词典建立单语语言的跨语言映射来获得双语语义。然后,我们构建一个深度学习分类器来提取双语并行句子。我们通过收集维吾尔语-汉语平行句子和构建机器翻译系统来证明我们方法的有效性。实验表明,该方法能够在资源匮乏的语言对中获得大而准确的双语平行句子。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号