首页> 外文会议>Twenty-ninth International Conference on Very Large Databases; Sep 9-12, 2003; Berlin, Germany >Optimized Query Execution in Large Search Engines with Global Page Ordering
【24h】

Optimized Query Execution in Large Search Engines with Global Page Ordering

机译:具有全局页面顺序的大型搜索引擎中的优化查询执行

获取原文
获取原文并翻译 | 示例

摘要

Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists. Over the last few years, search engines have incorporated new types of ranking techniques that exploit aspects such as the hyperlink structure of the web or the popularity of a page to obtain improved results. We focus on the question of how such techniques can be efficiently integrated into query processing. In particular, we study pruning techniques for query execution in large engines in the case where we have a global ranking of pages, as provided by Pagerank or any other method, in addition to the standard term-based approach. We describe pruning schemes for this case and evaluate their efficiency on an experimental cluster-based search engine with 120 million web pages. Our results show that there is significant potential benefit in such techniques.
机译:大型网络搜索引擎必须每秒以交互式响应时间回答数千个查询。执行查询成本的一个主要因素是查询词的倒排列表的长度,该长度随文档集合的大小而增加,并且通常在数兆字节的范围内。为了解决这个问题,IR和数据库研究人员提出了修剪技术,该技术可以计算或近似基于术语的排名函数,而无需扫描整个反向列表。在过去的几年中,搜索引擎结合了新型的排名技术,这些技术利用了诸如Web的超链接结构或页面的受欢迎程度之类的方面,以获得改进的结果。我们关注的问题是如何将这些技术有效地集成到查询处理中。特别是,除了基于标准术语的方法之外,我们还拥有Pagerank或其他任何方法提供的全局页面排名,因此,我们研究了大型引擎中查询执行的修剪技术。我们描述了这种情况下的修剪方案,并在具有1.2亿个网页的实验性基于集群的搜索引擎上评估了它们的效率。我们的结果表明,这种技术具有巨大的潜在利益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号