首页> 外文会议>Advances in information retrieval >Improving Query Correctness Using Centralized Probably Approximately Correct (PAC) Search
【24h】

Improving Query Correctness Using Centralized Probably Approximately Correct (PAC) Search

机译:使用集中的大概近似正确(PAC)搜索提高查询正确性

获取原文
获取原文并翻译 | 示例

摘要

A non-deterministic architecture for information retrieval, known as probably approximately correct (PAC) search, has recently been proposed. However, for equivalent storage and computational resources, the performance of PAC is only 63% of a deterministic system. We propose a modification to the PAC architecture, introducing a centralized query coordination node. To respond to a query, random sampling of computers is replaced with pseudo-random sampling using the query as a seed. Then, for queries that occur frequently, this pseudorandom sample is iteratively refined so that performance improves with each iteration. A theoretical analysis is presented that provides an upper bound on the performance of any iterative algorithm. Two heuristic algorithms are then proposed to iteratively improve the performance of PAC search. Experiments on the TREC-8 dataset demonstrate that performance can improve from 67% to 96% in just 10 iterations, and continues to improve with each iteration. Thus, for queries that occur 10 or more times, the performance of a non-deterministic PAC architecture can closely match that of a deterministic system.
机译:最近已经提出了一种用于信息检索的非确定性架构,称为大概正确(PAC)搜索。但是,对于同等的存储和计算资源,PAC的性能仅为确定性系统的63%。我们建议对PAC体系结构进行修改,引入一个集中式查询协调节点。为了响应查询,使用查询作为种子将计算机的随机采样替换为伪随机采样。然后,对于频繁出现的查询,将对该伪随机样本进行迭代优化,以使每次迭代都能提高性能。进行了理论分析,为任何迭代算法的性能提供了上限。然后提出了两种启发式算法来迭代地提高PAC搜索的性能。 TREC-8数据集上的实验表明,仅10次迭代,性能就可以从67%提高到96%,并且每次迭代都在不断提高。因此,对于发生10次或更多次的查询,非确定性PAC体系结构的性能可以与确定性系统的性能紧密匹配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号