...
首页> 外文期刊>Journal of Intelligent Information Systems >Answering keyword queries through cached subqueries in best match retrieval models
【24h】

Answering keyword queries through cached subqueries in best match retrieval models

机译:通过最佳匹配检索模型中的缓存子查询回答关键字查询

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Caching is one of the techniques that Information Retrieval Systems (IRS) and Web Search Engines (WSEs) use to reduce processing costs and attain faster response times. In this paper we introduce Top- K SCRC (Set Cover Results Cache), a novel technique for results caching which aims at maximizing the utilization of cache. Identical queries are treated as in plain results caching (i.e. their evaluation does not require accessing the index), while combinations of cached sub-queries are exploited as in posting lists caching, however the exploited subqueries are not necessarily single-word queries. The problem of finding the right set of cached subqueries to answer an incoming query, is actually the Exact Set Cover problem. This technique can be applied in any best match retrieval model that is based on a decomposable scoring function, and we show that several best-match retrieval models (i.e VSM, Okapi BM25 and hybrid retrieval models) rely on such scoring functions. To increase the capacity (in queries) of the cache only the top-K results of each cached query are stored and we introduce metrics for measuring the accuracy of the composed top-K answer. By analyzing queries submitted to real-world WSEs, we verified that there is a significant proportion of queries whose terms is the result of a union of the terms of other queries. The comparative evaluation over traces of real query sets showed that the Top-K SCRC is on the average two times faster than a plain Top-K RC for the same cache size.
机译:缓存是信息检索系统(IRS)和Web搜索引擎(WSE)用来降低处理成本并获得更快响应时间的技术之一。在本文中,我们介绍了Top-K SCRC(设置覆盖结果缓存),这是一种用于结果缓存的新颖技术,旨在最大程度地利用缓存。相同的查询被视为纯结果缓存(即,它们的求值不需要访问索引),而缓存的子查询的组合与发布列表缓存一样被利用,但是被利用的子查询不一定是单字查询。找到正确的缓存子查询集来回答传入查询的问题实际上是“精确集覆盖”问题。可以将这种技术应用于基于可分解评分功能的任何最佳匹配检索模型,并且我们证明了几种最佳匹配检索模型(即VSM,Okapi BM25和混合检索模型)都依赖于这种评分功能。为了增加缓存的容量(在查询中),仅存储每个缓存查询的前K个结果,并且我们引入度量标准来测量组成的前K个答案的准确性。通过分析提交给实际WSE的查询,我们验证了有很大一部分查询的条件是其他查询条件的并集的结果。对真实查询集的痕迹的比较评估表明,对于相同的缓存大小,Top-K SCRC平均比普通的Top-K RC快两倍。

著录项

  • 来源
    《Journal of Intelligent Information Systems》 |2015年第1期|67-106|共40页
  • 作者单位

    Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH),Science and Technology Park of Crete, Vassilika Vouton, P.O. Box 1385, Heraklion, Crete, 7110, Greece Computer Science Department, University of Crete, Voutes Campus, 700 13 Heraklion, Crete, Greece;

    Institute of Computer Science (ICS), Foundation for Research and Technology - Hellas (FORTH),Science and Technology Park of Crete, Vassilika Vouton, P.O. Box 1385, Heraklion, Crete, 7110, Greece Computer Science Department, University of Crete, Voutes Campus, 700 13 Heraklion, Crete, Greece;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Information retrieval; Query processing; Retrieval models; Ranking; Web search engines; Query log analysis;

    机译:信息检索;查询处理;检索模型;排行;网络搜索引擎;查询日志分析;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号