首页> 外文期刊>Multimedia Tools and Applications >Towards a new possibilistic query translation tool for cross-language information retrieval
【24h】

Towards a new possibilistic query translation tool for cross-language information retrieval

机译:寻求一种新的可能的查询翻译工具以进行跨语言信息检索

获取原文
获取原文并翻译 | 示例
       

摘要

Approaches of query translation in Cross-Language Information Retrieval (CLIR) have frequently used dictionaries which suffer from translation ambiguity. Besides, a word-by-word query translation is not sufficient. In this paper, we propose, evaluate and compare a new possibilistic approach for query translation in order to improve the previous dictionary-based ones. This approach uses a probability-to-possibility transformation as a mean to introduce further tolerance in query translation process. Firstly, we identify noun phrases (NPs) in the source query and translate them as units using translation patterns and a language model. Secondly, source query terms which are not included in any selected NPs are translated word-by-word using our new possibilistic approach of single word translation. Indeed, we take into account all query words and their translations when we choose the suitable translation of a given word. We start from the idea that the correct suitable translations of query terms have a tendency to co-occur in the target language documents unlike unsuitable ones. Finally, to increase the coverage of the bilingual dictionary, additional words and their translations are automatically generated from a parallel bilingual corpus. We tested our approach using the French-English parallel text corpus Europarl and the CLEF-2003 French-English CLIR test collection. The reported experiments showed the performance of the probability-to-possibility transformation-based approach compared to the probabilistic one and to some state-of-the-art CLIR tools.
机译:跨语言信息检索(CLIR)中的查询翻译方法经常使用字典,这些字典存在翻译歧义的问题。此外,逐字查询翻译是不够的。在本文中,我们提出,评估和比较了一种新的可能的查询翻译方法,以改进以前基于字典的方法。此方法使用概率到可能性转换作为在查询翻译过程中引入进一步容忍度的手段。首先,我们在源查询中识别名词短语(NP),并使用翻译模式和语言模型将它们翻译为单位。其次,使用我们新的单字翻译可能性方法,逐字翻译未包含在任何选定NP中的源查询词。确实,当我们选择给定单词的适当翻译时,我们会考虑所有查询词及其翻译。我们从这样的想法开始,即与不适当的查询词相比,查询词的正确适当的翻译倾向于在目标语言文档中同时出现。最后,为了增加双语词典的覆盖范围,会从平行的双语语料库中自动生成其他单词及其翻译。我们使用法语-英语并行文本语料库Europarl和CLEF-2003法语-英语CLIR测试集合测试了我们的方法。报告的实验表明,与概率方法和某些最新的CLIR工具相比,基于概率到可能性变换的方法的性能更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号