Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

机译：使用可比语料库的双语词典对低资源语言进行神经机器翻译

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Resources for the non-English languages are scarce and this paper addresses this problem in the context of machine translation, by automatically extracting parallel sentence pairs from the multilingual articles available on the Internet. In this paper, we have used an end-to-end Siamese bidirectional recurrent neural network to generate parallel sentences from comparable multilingual articles in Wikipedia. Subsequently, we have showed that using the harvested dataset improved BLEU scores on both NMT and phrase-based SMT systems for the low-resource language pairs: English-Hindi and English-Tamil, when compared to training exclusively on the limited bilingual corpora collected for these language pairs.

机译：非英语语言的资源稀缺，本文通过自动从Internet上可用的多语言文章中提取平行句子对来解决机器翻译环境中的此问题。在本文中，我们使用了端到端的暹罗双向递归神经网络，从维基百科中可比较的多语言文章中生成了平行句子。随后，我们表明，与仅针对有限语言集收集的有限语料库进行训练相比，使用收集的数据集在NMT和基于短语的SMT系统上针对资源较少的语言对（英语-印地语和英语-泰米尔语）均提高了BLEU分数这些语言对。

著录项

来源
《Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies》|2018年|112-119|共8页
会议地点
作者
Sree Harsha Ramesh; Krishna Prasad Sankaranarayanan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Neural machine translation for low-resource languages without parallel corpora [J] . Alina Karakanta, Jon Dehdari, Josef van Genabith Machine translation . 2018,第1a2期

机译：无需并行语料库的低资源语言的神经机器翻译
2. Extraction of Bilingual Dictionary from Comparable Corpora for Resource Scarce Languages [J] . Journal of computational and theoretical nanoscience . 2020,第1期

机译：从可比语料库中提取双语词典的资源稀缺语言
3. Automatic induction of bilingual resources from aligned parallel corpora:application to shallow-transfer machine translation [J] . Helena M. Caseli, Maria das Gracas V. Nunes, Mikel L. Forcada Machine translation . 2006,第4期

机译：从对齐的并行语料库中自动提取双语资源：在浅传输机器翻译中的应用
4. Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora [C] . Sree Harsha Ramesh, Krishna Prasad Sankaranarayanan Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . 2018

机译：使用比赛诱导的双语词典的神经机翻译
5. Parallel Sentence Detection in Comparable Corpora with Bilingual Word Embeddings for Low-Resource Languages [D] . Cadigan, John. 2018

机译：与低资源语言的双语单词嵌入式的同类语料中的并行句子检测
6. Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation [O] . Michael Adjeisah, Guohua Liu, Douglas Omwenga Nyabuga, 2021

机译：神经电机翻译低资源语料的假义注射和预先滤波
7. Extraction of domain-specific bilingual lexicon from comparable corpora: compositional translation and ranking [O] . Delpech Estelle, Daille Béatrice, Morin Emmanuel, 2012

机译：从可比语料库中提取特定领域的双语词典：组成翻译和排名
8. Construction of a Chinese-English Verb Lexicon for Embedded Machine Translation in Cross-Language Information Retrieval [R] . Dorr, B. J. , Lin, D. , Levow, G. 2002

机译：跨语言信息检索中嵌入式机器翻译的汉英动词词库构建

Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

摘要

著录项

相似文献

相关主题

期刊订阅