首页> 外文期刊>Information Sciences: An International Journal >Modeling term proximity for probabilistic information retrieval models
【24h】

Modeling term proximity for probabilistic information retrieval models

机译:为概率信息检索模型建模术语接近度

获取原文
获取原文并翻译 | 示例
           

摘要

Proximity among query terms has been found to be useful for improving retrieval performance. However, its application to classical probabilistic information retrieval models, such as Okapi's BM25, remains a challenging research problem. In this paper, we propose to improve the classical BM25 model by utilizing the term proximity evidence. Four novel methods, namely a window-based N-gram Counting method, Survival Analysis over different statistics, including the Poisson process, an exponential distribution and an empirical function, are proposed to model the proximity between query terms. Through extensive experiments on standard TREC collections, our proposed proximity-based BM25 model, called BM25P, is compared to strong state-of-the-art evaluation baselines, including the original unigram BM25 model, the Markov Random Field model, and the positional language model. According to the experimental results, the window-based N-gram Counting method, and Survival Analysis over an exponential distribution are the most effective among all four proposed methods, which lead to marked improvement over the baselines. This shows that the use of term proximity considerably enhances the retrieval effectiveness of the classical probabilistic models. It is therefore recommended to deploy a term proximity component in retrieval systems that employ probabilistic models.
机译:发现查询词之间的接近度对于提高检索性能很有用。但是,将其应用于经典的概率信息检索模型(例如Okapi的BM25)仍然是一个充满挑战的研究问题。在本文中,我们建议通过利用术语“邻近证据”来改进经典BM25模型。提出了四种新颖的方法,即基于窗口的N元语法计数方法,对不同统计量的生存分析(包括泊松过程,指数分布和经验函数)来对查询词之间的接近度进行建模。通过对标准TREC集合进行广泛的实验,将我们提出的基于接近度的BM25模型(称为BM25P)与强大的最新评估基准进行了比较,其中包括原始的unigram BM25模型,马尔可夫随机场模型和位置语言模型。根据实验结果,在所有四个提议的方法中,基于窗口的N-gram计数方法和指数分布的生存分析是最有效的,这导致对基线的显着改善。这表明术语接近度的使用大大提高了经典概率模型的检索效率。因此,建议在采用概率模型的检索系统中部署术语邻近组件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号