首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >A Large-scale Study of Statistical Machine Translation Methods for Khmer Language
【24h】

A Large-scale Study of Statistical Machine Translation Methods for Khmer Language

机译:高棉语统计机器翻译方法的大规模研究

获取原文

摘要

This paper contributes the first published evaluation of the quality of automatic translation between Khmer (the official language of Cambodia) and twenty other languages, in both directions. The experiments were carried out using three different statistical machine translation approaches: phrase-based, hierarchical phrase-based, and the operation sequence model (OSM). In addition two different segmentation schemes for Khmer were studied, these were syllable segmentation and supervised word segmentation. The results show that the highest quality machine translation was attained with word segmentation in all of the experiments. Furthermore, with the exception of very distant language pairs the OSM approach gave the highest quality translations when measured in terms of both the BLEU and RIBES scores. For distant languages, our results showed a hierarchical phrase-based approach to be the most effective. An analysis of the experimental results indicated that Kendall's tau may be directly used as a means of selecting an appropriate machine translation approach for a given language pair.
机译:本文首次发表了对高棉语(柬埔寨的官方语言)和其他二十种语言之间双向自动翻译质量的评估。使用三种不同的统计机器翻译方法进行了实验:基于短语的,基于层次短语的以及操作序列模型(OSM)。此外,针对高棉语研究了两种不同的分割方案,即音节分割和有监督的词分割。结果表明,在所有实验中,通过分词都能获得最高质量的机器翻译。此外,除BLEU和RIBES分数外,OSM方法还提供了最高质量的翻译,除了非常遥远的语言对。对于遥远的语言,我们的结果表明基于分层短语的方法是最有效的。对实验结果的分析表明,肯德尔的tau可以直接用作为给定语言对选择适当的机器翻译方法的手段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号