首页> 外文期刊>Procedia Computer Science >A Study of Statistical Machine Translation Methods for Under Resourced Languages
【24h】

A Study of Statistical Machine Translation Methods for Under Resourced Languages

机译:资源匮乏语言下的统计机器翻译方法研究

获取原文
           

摘要

This paper contributes an empirical study of the application of ?ve state-of-the-art machine translation to the trans- lation of low-resource languages. The methods studied were phrase-based, hierarchical phrase-based, the operational sequence model, string-to-tree, tree-to-string statistical machine translation methods between English (en) and the under resourced languages Lao (la), Myanmar (mm), Thai (th) in both directions. The performance of the machine translation systems was automatically measured in terms of BLEU and RIBES for all experiments. Our main ?ndings were that the phrase-based SMT method generally gave the highest BLEU scores. This was counter to expectations, and we believe indicates that this method may be more robust to limitations on the data set size. However, when evaluated with RIBES, the best scores came from methods other than phrase-based SMT, indicating that the other methods were able to handle the word re-ordering better even under the constraint of limited data. Our study achieved the highest reported results on the data sets for all translation language pairs.
机译:本文为对五个最先进的机器翻译在低资源语言翻译中的应用进行了实证研究。研究的方法是基于短语,基于分层短语,操作序列模型,字符串到树,树到字符串的统计机器翻译方法,介于英语(en)和资源不足的语言老挝(la),缅甸(毫米),两个方向上的泰数(th)。在所有实验中,机器翻译系统的性能均根据BLEU和RIBES进行了自动测量。我们的主要发现是,基于短语的SMT方法通常给出最高的BLEU分数。这与预期背道而驰,并且我们认为该方法可能对限制数据集大小更为稳健。但是,当使用RIBES进行评估时,最佳分数来自基于短语的SMT以外的方法,这表明即使在数据有限的情况下,其他方法也能够更好地处理单词重排。我们的研究在所有翻译语言对的数据集上取得了最高的报道结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号