首页> 外文期刊>Information Processing & Management >An ensemble of transliteration models for information retrieval
【24h】

An ensemble of transliteration models for information retrieval

机译:信息检索的音译模型集合

获取原文
获取原文并翻译 | 示例
       

摘要

Transliteration is used to phonetically translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Because transliterations are usually representative index terms for documents, proper handling of the transliterations is important for an effective information retrieval system. However, there are limitations on handling transliterations depending on dictionary lookup, because transliterations are usually not registered in the dictionary. For this reason, many researchers have been trying to overcome the problem using machine transliteration. In this paper, we propose a method for improving machine transliteration using an ensemble of three different transliteration models. Because one transliteration model alone has limitation on reflecting all possible transliteration behaviors, several transliteration models should be complementary used in order to achieve a high-performance machine transliteration system. This paper describes a method about transliteration production using the several machine transliteration models and transliteration ranking with web data and relevance scores given by each transliteration model. We report evaluation results for our ensemble transliteration model and experimental results for its impact on IR effectiveness. Machine transliteration tests on English-to-Korean transliteration and English-to-Japanese transliteration show that our proposed method achieves 78-80% word accuracy. Information retrieval tests on KTSET and NTCIR-1 test collection show that our transliteration model can improve the performance of an information retrieval system about 10-34%. (c) 2005 Elsevier Ltd. All rights reserved.
机译:音译用于在语音上翻译专有名称和技术术语,特别是从罗马字母的语言到非罗马字母的语言,例如从英语到韩语,日语和中文。因为音译通常是文档的代表性索引术语,所以正确处理音译对于有效的信息检索系统很重要。但是,由于音译通常未在字典中进行注册,因此根据字典查找来处理音译存在局限性。因此,许多研究人员一直在尝试使用机器音译来克服该问题。在本文中,我们提出了一种使用三种不同音译模型的集成来改进机器音译的方法。由于仅一个音译模型在反映所有可能的音译行为方面存在局限性,因此应互补使用几个音译模型,以实现高性能的机器音译系统。本文介绍了一种使用几种机器音译模型进行音译生产的方法,并使用网络数据和每个音译模型给出的相关性得分对音译排名进行了评估。我们报告了整体音译模型的评估结果以及其对IR效果的影响的实验结果。机器对英语-韩语音译和英语-日文音译的音译测试表明,我们提出的方法可达到78-80%的单词准确度。在KTSET和NTCIR-1测试集中的信息检索测试表明,我们的音译模型可以将信息检索系统的性能提高约10-34%。 (c)2005 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号