首页> 外文会议>Advances in Information Systems >Comparison of Normalization Techniques for Metasearch
【24h】

Comparison of Normalization Techniques for Metasearch

机译:元搜索规范化技术的比较

获取原文

摘要

It is well-known fact that the combination of the retrieval outputs of different search systems in response to a query, known as metasearch, improves performance on average, provided that these combined systems (1) have compatible outputs, (2) produce accurate probability of relevance estimates of documents, and (3) be independent of each other. The objective of a normalization technique is to target the first requirement, i.e., document scores of different retrieval outputs are brought into a common scale so that document scores can be comparable across combined retrieval outputs. This has been a recent subject of researches in metasearch and information filtering fields. In this paper, we present a different perspective on multiple evidence combination and investigate various normalization techniques, mostly ad-hoc in nature, with a special focus on the SUM, which shifts minimum scores to zero and then scales their summation to one. This formal approach is equivalent to normalize the distribution of scores of all documents in a retrieval output by dividing them by their sample mean. We have made extensive experiments using ad hoc tracks of third and fifth TREC collections and CLEF'OO database. We argue that (1) the normalization method SUM is consistently better than the other traditionally proposed ones when combining outputs of search systems operating on a single database; (2) the SUM for combination of outputs of search systems operating on mutually exclusive databases is still valuable alternative to the one weighting score distributions of documents by their databases' size.
机译:众所周知的事实是,只要这些组合系统(1)具有兼容的输出,(2)产生准确的概率,则响应于查询的不同搜索系统的检索输出的组合(称为元搜索)可以平均提高性能。文档的相关性估计值;(3)彼此独立。归一化技术的目标是针对第一个要求,即将不同检索输出的文档分数放到一个共同的标度中,以便文档分数在组合的检索输出中可比较。这已经是元搜索和信息过滤领域中研究的最新主题。在本文中,我们对多种证据组合提出了不同的观点,并研究了各种归一化技术,其中大多数都是临时性的,特别是SUM,将最小分数变为零,然后将其总和缩放为一。这种形式化方法等效于通过将它们除以样本均值来标准化检索输出中所有文档的分数分布。我们使用第三和第五个TREC集合以及CLEF'OO数据库的临时跟踪进行了广泛的实验。我们认为:(1)当组合在单个数据库上运行的搜索系统的输出时,归一化方法SUM始终优于其他传统提出的方法; (2)在互斥数据库上运行的搜索系统的输出组合的SUM仍然是有价值的替代方法,可以替代按数据库大小分配的文档加权分值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号