Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

Xiayang Shi; Ping Yue; Xinyi LiuChun XuLin Xu

首页> 外文期刊>Computational intelligence and neuroscience >Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

【24h】

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

机译：在最少的监督下获得低资源语言对中的平行句子

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.

机译：机器翻译依赖于平行句子，其数量是影响机器翻译系统性能的重要因素，尤其是在资源匮乏的语言中。最近通过机器学习从非并行数据中学习跨语言单词表示的进展为在低资源语言中以最少的监督获得双语句子提供了新的可能性。在本文中，我们引入了一种新的方法，仅通过大约数百个条目的小尺寸双语种子词典来获得平行句子。我们首先通过种子词典建立单语语言的跨语言映射来获得双语语义。然后，我们构建一个深度学习分类器来提取双语并行句子。我们通过收集维吾尔语-汉语平行句子和构建机器翻译系统来证明我们方法的有效性。实验表明，该方法能够在资源匮乏的语言对中获得大而准确的双语平行句子。

著录项

来源
《Computational intelligence and neuroscience》 |2022年第42期|ArticleID5296946-ArticleID5296946|共1页
作者
Xiayang Shi; Ping Yue; Xinyi LiuChun XuLin Xu;
展开▼
作者单位

College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类寄生生物学;
关键词

Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision

摘要

著录项

引文网络

相关主题

期刊订阅