首页> 外文期刊>ACM Transactions on Information Systems >Static Index Pruning in Web Search Engines: Combining Term and Document Popularities with Query Views
【24h】

Static Index Pruning in Web Search Engines: Combining Term and Document Popularities with Query Views

机译:Web搜索引擎中的静态索引修剪:将术语和文档的流行度与查询视图相结合

获取原文
获取原文并翻译 | 示例

摘要

Static index pruning techniques permanently remove a presumably redundant part of an inverted file, to reduce the file size and query processing time. These techniques differ in deciding which parts of an index can be removed safely; that is, without changing the top-ranked query results. As defined in the literature, the query view of a document is the set of query terms that access to this particular document, that is, retrieves this document among its top results. In this paper, we first propose using query views to improve the quality of the top results compared against the original results. We incorporate query views in a number of static pruning strategies, namely term-centric, document-centric, term popularity based and document access popularity based approaches, and show that the new strategies considerably outperform their counterparts especially for the higher levels of pruning and for both disjunctive and conjunctive query processing. Additionally, we combine the notions of term and document access popularity to form new pruning strategies, and further extend these strategies with the query views. The new strategies improve the result quality especially for the conjunctive query processing, which is the default and most common search mode of a search engine.
机译:静态索引修剪技术会永久删除反向文件的可能多余部分,以减少文件大小和查询处理时间。这些技术的不同之处在于确定可以安全删除索引的哪些部分。也就是说,无需更改排名靠前的查询结果。如文献中所定义,文档的查询视图是访问该特定文档的查询词集,也就是说,从其主要结果中检索该文档。在本文中,我们首先提出使用查询视图相对于原始结果来提高顶部结果的质量。我们将查询视图合并到许多静态修剪策略中,即以术语为中心,以文档为中心,基于术语流行度和基于文档访问流行度的方法,并显示出新策略在性能上明显优于其他策略,尤其是在较高的修剪水平和析取和合并查询处理。此外,我们结合术语和文档访问流行度的概念来形成新的修剪策略,并通过查询视图进一步扩展这些策略。新策略提高了结果质量,尤其是对于联合查询处理(这是搜索引擎的默认且最常见的搜索模式)而言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号