首页> 外文会议>International conference on computational linguistics >Confusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation
【24h】

Confusion Network for Arabic Name Disambiguation and Transliteration in Statistical Machine Translation

机译:阿拉伯语名称歧义和音译的混淆网络在统计机器翻译中

获取原文

摘要

Arabic words are often ambiguous between name and non-name interpretations, frequently leading to incorrect name translations. We present a technique to disambiguate and transliterate names even if name interpretations do not exist or have relatively low probability distributions in the parallel training corpus. The key idea comprises named entity classing at the preprocessing step, decoding of a simple confusion network created from the name class label and the input word at the statistical machine translation step, and transliteration of names at the post-processing step. Human evaluations indicate that the proposed technique leads to a statistically significant translation quality improvement of highly ambiguous evaluation data sets without degrading the translation quality of a data set with very few names.
机译:阿拉伯语单词通常在名称和非名称解释之间模糊不清,经常导致名称翻译不正确。 我们展示了一种消除和解名称的技术,即使名称解释不存在或在并行训练语料库中存在相对较低的概率分布。 关键思想包括在预处理步骤中指定实体类,解码从名称类标签创建的简单混淆网络以及在统计机器翻译步骤中的输入字,以及后处理步骤中的名称的音译。 人类评估表明,该技术导致高度模糊的评估数据集的统计上显着的翻译质量改进,而不会降低具有极少名称的数据集的翻译质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号