首页> 外文期刊>Information Processing & Management >Structured queries, language modeling, and relevance modeling in cross-language information retrieval
【24h】

Structured queries, language modeling, and relevance modeling in cross-language information retrieval

机译:跨语言信息检索中的结构化查询,语言建模和相关性建模

获取原文
获取原文并翻译 | 示例

摘要

Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast. language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries-one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage. (C) 2004 Elsevier Ltd. All rights reserved.
机译:如今,两种跨语言检索的概率方法得到了广泛使用,一种是基于相关性概率模型(以INQUERY为例),另一种是基于语言建模。作为查询网络模型,INQUERY允许轻松合并包括同义运算符在内的查询运算符,事实证明,该运算符在跨语言信息检索(CLIR)中非常有用,通常称为结构化查询翻译。相反。语言模型将翻译概率合并到一个统一的框架中。我们使用两种双语词典对阿拉伯和西班牙数据集上的两种方法进行比较,一种是从常规词典中衍生出来的,另一种是从平行语料库中衍生出来的。我们发现,当不扩展查询时,结构化查询处理会产生更好的结果。另一方面,当扩展查询时,语言建模会提供更好的结果,但只有在使用从并行语料库派生的概率字典时,我们才能解决结构化查询处理与语言建模的比较中固有的两个其他问题。首先是查询扩展,第二是翻译概率的作用。我们将传统的扩展技术(伪相关反馈)与相关建模进行了比较,相关建模是一种新的IR方法,适用于语言建模的正式框架。我们发现,相关性建模和伪相关性反馈可以实现可比较的检索级别,并且良好的翻译概率可带来很小但明显的优势。 (C)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号