首页> 外文期刊>The Journal of Artificial Intelligence Research >Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages
【24h】

Improving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages

机译:使用相关的资源丰富的语言改善资源贫乏的语言的统计机器翻译

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X_1 into a resource rich language Y given a bi-text containing a limited number of parallel sentences for X_1-Y and a larger bi-text for X_2-Y for some resource-rich language X_2 that is closely related to X_1. This is achieved by taking advantage of the opportunities that vocabulary overlap and similarities between the languages X_1 and X_2 in spelling, word order, and syntax offer: (1) we improve the word alignments for the resource-poor language, (2) we further augment it with additional translation options, and (3) we take care of potential spelling differences through appropriate transliteration. The evaluation for Indonesian→English using Malay and for Spanish→English using Portuguese and pretending Spanish is resource-poor shows an absolute gain of up to 1.35 and 3.37 BLEU points, respectively, which is an improvement over the best rivaling approaches, while using much less additional data. Overall, our method cuts the amount of necessary "real" training data by a factor of 2-5.
机译:我们提出了一种与语言无关的新颖方法,通过利用它们与资源丰富的语言的相似性来改善资源贫乏的语言的机器翻译。更准确地说,如果给定一个包含有限数量的X_1-Y并行语句和一个较大的X_2-Y双向文本的双向文本,我们可以将资源贫乏的源语言X_1转换为资源丰富的语言Y。与X_1紧密相关的丰富语言X_2。这是通过利用X_1和X_2语言在拼写,单词顺序和语法提供方面存在词汇重叠和相似性的机会来实现的:(1)我们改善了资源贫乏语言的单词对齐方式,(2)我们进一步通过其他翻译选项进行扩充,以及(3)通过适当的音译来照顾潜在的拼写差异。印度尼西亚语→英语(使用马来语)和西班牙语→英语(使用葡萄牙语和假装西班牙语)的评估资源贫乏,它们分别显示出绝对收益高达1.35和3.37 BLEU点,这是对最佳竞争方法的一种改进,同时使用了很多方法较少的附加数据。总体而言,我们的方法将必要的“真实”训练数据量减少了2-5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号