【24h】

A maximum coherence model for dictionary-based cross-language information retrieval

机译:基于字典的跨语言信息检索的最大一致性模型

获取原文

摘要

One key to cross-language information retrieval is how to efficiently resolve the translation ambiguity of queries given their short length. This problem is even more challenging when only bilingual dictionaries are available, which is the focus of this paper. In the previous research of cross-language information retrieval using bilingual dictionaries, the word co-occurrence statistics is used to determine the most likely translations of queries. In this paper, we propose a novel statistical model, named ``maximum coherence model'', which estimates the translation probabilities of query words that are consistent with the word co-occurrence statistics. Unlike the previous work, where a binary decision is made for the selection of translations, the new model maintains the uncertainty in translating query words when their sense ambiguity is difficult to resolve. Furthermore, this new model is able to estimate translations of multiple query words simultaneously. This is in contrast to many previous approaches where translations of individual query words are determined independently. Empirical studies with TREC datasets have shown that the maximum coherence model achieves a relative 10% - 40% improvement in cross-language information retrieval, comparing to other approaches that also use word co-occurrence statistics for sense disambiguation.
机译:跨语言信息检索的一个关键是在给定长度较短的情况下,如何有效解决查询的翻译歧义。当只有双语词典可用时,这个问题甚至更具挑战性,这是本文的重点。在以前使用双语词典进行的跨语言信息检索的研究中,单词共现统计用于确定查询的最可能翻译。在本文中,我们提出了一种新颖的统计模型,称为``最大一致性模型'',该模型估计了与单词共现统计一致的查询词的翻译概率。与先前的工作(决定翻译选择的二元决策)不同,新模型在难以解决查询词的歧义时保持了翻译查询词时的不确定性。此外,该新模型能够同时估计多个查询词的翻译。这与许多先前的方法相反,在先前的方法中,各个查询词的翻译是独立确定的。与TREC数据集进行的经验研究表明,与其他也使用词共现统计来消除歧义的方法相比,最大一致性模型在跨语言信息检索中实现了10%-40%的相对改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号