首页> 外文会议>International conference on intelligent text processing and computational linguistics >Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages
【24h】

Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages

机译:统计机器翻译从形态丰富和资源贫乏的语言中转化而来

获取原文

摘要

In this paper, we consider the challenging problem of automatic machine translation between a language pair which is both morphologically rich and low resourced: Sinhala and Tamil. We build a phrase based Statistical Machine Translation (SMT) system and attempt to enhance it by unsupervised morphological analysis. When translating across this pair of languages, morphological changes result in large numbers of out-of-vocabulary (OOV) terms between training and test sets leading to reduced BLEU scores in evaluation. This early work shows that unsupervised morphological analysis using the Morfessor algorithm, extracting morpheme-like units is able to significantly reduce the OOV problem and help in improved translation.
机译:在本文中,我们考虑了在形态丰富且资源匮乏的语言对之间的自动机器翻译的挑战性问题:僧伽罗语和泰米尔语。我们建立了一个基于短语的统计机器翻译(SMT)系统,并尝试通过无监督形态学分析来增强它。跨这对语言进行翻译时,形态变化会导致训练和测试集之间出现大量词汇外(OOV)术语,从而导致评估中的BLEU分数降低。这项早期工作表明,使用Morfessor算法进行无监督的形态分析,提取类似词素的单元能够显着减少OOV问题,并有助于改善翻译。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号