首页> 外文会议> >Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices
【24h】

Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices

机译:使用位图索引最小化多维查询的I / O成本

获取原文

摘要

Bitmap indices have been widely used in scientific applications and commercial systems for processing complex, multi-dimensional queries where traditional tree-based indices would not work efficiently. A common approach for reducing the size of a bitmap index for high cardinality attributes is to group ranges of values of an attribute into bins and then build a bitmap for each bin rather than a bitmap for each value of the attribute. Binning reduces storage costs, however, results of queries based on bins often require additional filtering for discarding false positives, i.e., records in the result that do not satisfy the query constraints. This additional filtering, also known as candidate checking, requires access to the base data on disk and involves significant I/O costs. This paper studies strategies for minimizing the I/O costs for candidate checking for multi-dimensional queries. This is done by determining the number of bins allocated for each dimension and then placing bin boundaries in optimal locations. Our algorithms use knowledge of data distribution and query workload. We derive several analytical results concerning optimal bin allocation for a probabilistic query model. Our experimental evaluation with real life data shows an average I/O cost improvement of at least a factor of 10 for multi-dimensional queries on datasets from two different applications. Our experiments also indicate that the speedup increases with the number of query dimensions.
机译:位图指数已广泛用于科学应用程序和商业系统,用于处理复杂的多维查询,其中传统的基于树的索引无法有效地工作。用于降低高基数属性的位图索引的大小的常用方法是将属性的值分组到箱中,然后为每个垃圾邮件构建位图而不是属性的每个值的位图。 Binning降低了存储成本,但是,基于频体的查询结果通常需要额外的过滤,以丢弃误报,即记录不满足查询约束的结果。该附加滤波,也称为候选检查,需要访问磁盘上的基本数据,并且涉及显着的I / O成本。本文研究了最小化候选人检查多维查询的I / O成本的策略。这是通过确定为每个维度分配的箱数,然后在最佳位置放置箱边界的箱子的数量来完成。我们的算法使用数据分发和查询工作负载的知识。我们派生了几种关于概率查询模型的最佳BIN分配的分析结果。我们的实验评估与真实生活数据显示了来自两个不同应用程序的数据集上的多维查询的至少10个因素的平均I / O成本提高。我们的实验还表明加速度随着查询尺寸的数量而增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号