首页> 外文会议>International conference on very large data bases >Bitlist: New Full-text Index for Low Space Cost and Efficient Keyword Search
【24h】

Bitlist: New Full-text Index for Low Space Cost and Efficient Keyword Search

机译:BitList:新的全文索引用于低空间成本和有效的关键字搜索

获取原文

摘要

Nowadays Web search engines are experiencing significant performance challenges caused by a huge amount of Web pages and increasingly larger number of Web users. The key issue for addressing these challenges is to design a compact structure which can index Web documents with low space and meanwhile process keyword search very fast. Unfortunately, the current solutions typically separate the space optimization from the search improvement. As a result, such solutions either save space yet with search inefficiency, or allow fast keyword search but with huge space requirement. In this paper, to address the challenges, we propose a novel structure bitlist with both low space requirement and supporting fast keyword search. Specifically, based on a simple and yet very efficient encoding scheme, bitlist uses a single number to encode a set of integer document IDs for low space, and adopts fast bitwise operations for very efficient boolean-based keyword search. Our extensive experimental results on real and synthetic data sets verify that bitlist outperforms the recent proposed solution, inverted list compression [23,22] by spending 36.71 % less space and 61.91% faster processing time, and achieves comparable running time as [8] but with significantly lower space.
机译:如今,Web搜索引擎正在遇到大量的网页和越来越多的网络用户造成的显着性能挑战。解决这些挑战的关键问题是设计一种紧凑的结构,可以使用低空间和同时处理关键字搜索索引Web文档。不幸的是,目前的解决方案通常将空间优化与搜索改进分开。因此,此类解决方案尚未以搜索低效保存空间,或允许快速关键字搜索,但具有巨大的空间要求。在本文中,为了解决挑战,我们提出了一种具有低空间需求和支持快速关键字搜索的新型结构比特列表。具体地,基于简单且非常有效的编码方案,比特列表使用单个数字来对一组整数文档ID编码为低空间,并为基于非常高效的布尔基关键字搜索采用快速按位操作。我们对实际和合成数据集的广泛实验结果验证了比特列表胜过最近提出的解决方案,倒立列表压缩[23,22]通过花费36.71%的空间,加工时间快36.71%,并实现了与[8]的相当运行时间空间明显较低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号