【24h】

Improving collection selection with overlap awareness in P2P search engines

机译:通过P2P搜索引擎中的重叠感知来改善集合选择

获取原文

摘要

Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.
机译:馆藏选择多年来一直是研究问题。通常,在相关工作中,使用预先计算的统计信息来估计每个集合的预期结果质量,然后对集合进行相应的排名。我们的论点是,这种简单的方法对于集合通常重叠的几个应用程序是不够的。例如,对于由自主对等体爬网的集合而言,就是这种情况。我们主张使用集合之间相互重叠的估计值来扩展现有质量度量,并提出其中组合优于CORI(基于质量估计的流行方法)的实验。我们概述了称为MINERVA的P2P网络搜索引擎的原型实现,该引擎允许以分布式和自组织的方式处理大量数据。我们进行的实验表明,在馆藏选择过程中考虑重叠部分可以大大减少为了达到令人满意的召回水平而必须联系的馆藏数量,这是朝着分布式网络搜索的可行性迈出的重要一步。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号