首页> 外文会议>IEEE international conference on data engineering >A tunable compression framework for bitmap indices
【24h】

A tunable compression framework for bitmap indices

机译:位图索引的可调压缩框架

获取原文

摘要

Bitmap indices are widely used for large read-only repositories in data warehouses and scientific databases. Their binary representation allows for the use of bitwise operations and specialized run-length compression techniques. Due to a trade-off between compression and query efficiency, bitmap compression schemes are aligned using a fixed encoding length size (typically the word length) to avoid explicit decompression during query time. In general, smaller encoding lengths provide better compression, but require more decoding during query execution. However, when the difference in size is considerable, it is possible for smaller encodings to also provide better execution time. We posit that a tailored encoding length for each bit vector will provide better performance than a one-size-fits-all approach. We present a framework that optimizes compression and query efficiency by allowing bitmaps to be compressed using variable encoding lengths while still maintaining alignment to avoid explicit decompression. Efficient algorithms are introduced to process queries over bitmaps compressed using different encoding lengths. An input parameter controls the aggressiveness of the compression providing the user with the ability to tune the tradeoff between space and query time. Our empirical study shows this approach achieves significant improvements in terms of both query time and compression ratio for synthetic and real data sets. Compared to 32-bit WAH, VAL-WAH produces up to 1.8× smaller bitmaps and achieves query times that are 30% faster.
机译:位图索引已广泛用于数据仓库和科学数据库中的大型只读存储库。它们的二进制表示形式允许使用按位运算和专用的行程压缩技术。由于要在压缩和查询效率之间进行权衡,因此使用固定的编码长度大小(通常是字长)来对齐位图压缩方案,以避免在查询期间进行显式解压缩。通常,较小的编码长度可提供更好的压缩效果,但在查询执行期间需要更多的解码。但是,当大小差异很大时,较小的编码也可能会提供更好的执行时间。我们假设为每个位向量量身定制的编码长度将提供比“一刀切”的方法更好的性能。我们提出了一个框架,该框架通过允许使用可变编码长度压缩位图同时仍保持对齐以避免显式解压缩,从而优化了压缩和查询效率。引入了有效的算法来处理对使用不同编码长度压缩的位图的查询。输入参数控制压缩的积极性,使用户能够调整空间和查询时间之间的权衡。我们的经验研究表明,这种方法在综合和真实数据集的查询时间和压缩率方面均取得了显着改善。与32位WAH相比,VAL-WAH产生的位图小1.8倍,查询时间快30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号