首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >A Large-scale Study of Statistical Machine Translation Methods for Khmer Language
【24h】

A Large-scale Study of Statistical Machine Translation Methods for Khmer Language

机译:高棉语言统计机器翻译方法的大规模研究

获取原文

摘要

This paper contributes the first published evaluation of the quality of automatic translation between Khmer (the official language of Cambodia) and twenty other languages, in both directions. The experiments were carried out using three different statistical machine translation approaches: phrase-based, hierarchical phrase-based, and the operation sequence model (OSM). In addition two different segmentation schemes for Khmer were studied, these were syllable segmentation and supervised word segmentation. The results show that the highest quality machine translation was attained with word segmentation in all of the experiments. Furthermore, with the exception of very distant language pairs the OSM approach gave the highest quality translations when measured in terms of both the BLEU and RIBES scores. For distant languages, our results showed a hierarchical phrase-based approach to be the most effective. An analysis of the experimental results indicated that Kendall's tau may be directly used as a means of selecting an appropriate machine translation approach for a given language pair.
机译:本文有助于在两个方向上发布了第一次公布了高棉(柬埔寨官方语言)与二十种其他语言的自动翻译质量。使用三种不同的统计机器翻译方法进行实验:基于短语,基于分层短语和操作序列模型(OSM)。此外,研究了Khmer的两个不同的分割方案,这些是音节细分和监督字分割。结果表明,所有实验中的单词分段达到了最高质量的机器翻译。此外,除了非常遥控语言对之外,OSM方法在以BLEU和RIBES分数方面测量时提供了最高质量的翻译。对于远程语言,我们的结果表明,基于分层的短语的方法是最有效的。对实验结果的分析表明,KENDALL的TAU可以直接用作选择给定语言对选择适当的机器翻译方法的手段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号