首页> 外文会议>Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies >Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora
【24h】

Neural Machine Translation for Low Resource Languages using Bilingual Lexicon Induced from Comparable Corpora

机译:使用可比语料库的双语词典对低资源语言进行神经机器翻译

获取原文

摘要

Resources for the non-English languages are scarce and this paper addresses this problem in the context of machine translation, by automatically extracting parallel sentence pairs from the multilingual articles available on the Internet. In this paper, we have used an end-to-end Siamese bidirectional recurrent neural network to generate parallel sentences from comparable multilingual articles in Wikipedia. Subsequently, we have showed that using the harvested dataset improved BLEU scores on both NMT and phrase-based SMT systems for the low-resource language pairs: English-Hindi and English-Tamil, when compared to training exclusively on the limited bilingual corpora collected for these language pairs.
机译:非英语语言的资源稀缺,本文通过自动从Internet上可用的多语言文章中提取平行句子对来解决机器翻译环境中的此问题。在本文中,我们使用了端到端的暹罗双向递归神经网络,从维基百科中可比较的多语言文章中生成了平行句子。随后,我们表明,与仅针对有限语言集收集的有限语料库进行训练相比,使用收集的数据集在NMT和基于短语的SMT系统上针对资源较少的语言对(英语-印地语和英语-泰米尔语)均提高了BLEU分数这些语言对。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号