首页> 外文会议>International ACM SIGIR conference on research development in information retrieval >Mixture Model with Multiple Centralized Retrieval Algorithms for Result Merging in Federated Search
【24h】

Mixture Model with Multiple Centralized Retrieval Algorithms for Result Merging in Federated Search

机译:联合搜索中用于结果合并的具有多个集中检索算法的混合模型

获取原文

摘要

Result merging is an important research problem in federated search for merging documents retrieved from multiple ranked lists of selected information sources into a single list. The state-of-the-art result merging algorithms such as Semi-Supervised Learning (SSL) and Sample-Agglomerate Fitting Estimate (SAFE) try to map document scores retrieved from different sources to comparable scores according to a single centralized retrieval algorithm for ranking those documents. Both SSL and SAFE arbitrarily select a single centralized retrieval algorithm for generating comparable document, scores, which is problematic in a heterogeneous federated search environment, since a single centralized algorithm is often suboptimal for different information sources. Based on this observation, this paper proposes a novel approach for result merging by utilizing multiple centralized retrieval algorithms. One simple approach is to learn a set of combination weights for multiple centralized retrieval algorithms (e.g., logistic regression) to compute comparable document scores. The paper shows that this simple approach generates suboptimal results as it is not flexible enough to deal with heterogeneous information sources. A mixture probabilistic model is thus proposed to learn more appropriate combination weights with respect to different types of information sources with some training data. An extensive set of experiments on three datasets have proven the effectiveness of the proposed new approach.
机译:结果合并是联合搜索中的一个重要研究问题,该联合搜索用于将从选定信息源的多个已排序列表中检索到的文档合并到单个列表中。最先进的结果合并算法(例如半监督学习(SSL)和样本聚集拟合估计(SAFE))尝试根据单个集中式检索算法将从不同来源检索的文档分数映射到可比分数这些文件。 SSL和SAFE都随意选择一个集中式检索算法来生成可比较的文档分数,这在异构联合搜索环境中是有问题的,因为单个集中式算法对于不同的信息源通常不是最佳的。基于这种观察,本文提出了一种利用多种集中式检索算法进行结果合并的新方法。一种简单的方法是为多个集中式检索算法(例如,逻辑回归)学习一组组合权重,以计算可比较的文档分数。本文表明,这种简单的方法会产生次优的结果,因为它不够灵活,无法处理异构信息源。因此,提出了一种混合概率模型,以通过一些训练数据针对不同类型的信息源学习更合适的组合权重。在三个数据集上进行的大量实验证明了该新方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号