首页> 外文会议> >Query result size estimation using a novel histogram-like technique: the rectangular attribute cardinality map
【24h】

Query result size estimation using a novel histogram-like technique: the rectangular attribute cardinality map

机译:使用类似于直方图的新技术查询结果大小:矩形属性基数图

获取原文

摘要

Current database systems utilize histograms to approximate frequency distributions of attribute values of relations. These are used to efficiently estimate query result sizes and access plan costs. Even though they have been in use for nearly two decades, there has been no significant mathematical techniques (other than those used in statistics for traditional histogram approximations) to study them. We introduce a new histogram-like approximation strategy called the Rectangular Attribute Cardinality Map (R-ACM), that aims to approximate the density of the underlying attribute values using the philosophies of numerical integration. In this new histogram-like approximation method, the density function within a given sector is approximated by a rectangular cell, where the height of the cell is obtained so as to guarantee that the actual probability density differs from the approximated one by a maximum of a user specified tolerance, /spl tau/. Furthermore, unlike the two traditional histogram types, namely equi-width and equi-depth, the R-ACM is neither equi-width nor equi-depth. Analytically, we show that for the R-ACM, the distribution of an attribute value within the sector is binomially distributed. This permits us to derive worst-case and average case results for the estimation errors of the probability mass itself. Our theoretical results, which include a rigorous maximum likelihood and expected case analyses, and an extensive set of experiments demonstrate that the R-ACM scheme (which is essentially histogram-like) is much more accurate than the traditional histograms for query result size estimation. Due to its high accuracy and low construction costs, we hope that it could become an invaluable tool for query optimization in the future database systems.
机译:当前的数据库系统利用直方图来近似关系的属性值的频率分布。这些用于有效地估计查询结果的大小和访问计划的成本。即使它们已经使用了将近二十年,也没有重要的数学技术(用于传统直方图近似的统计技术除外)来研究它们。我们引入了一种新的类似于直方图的近似策略,称为矩形属性基数图(R-ACM),其目的是使用数值积分的原理来近似基础属性值的密度。在这种新的类似于直方图的近似方法中,给定扇区内的密度函数由一个矩形像元近似,在该矩形像元中获得像元的高度,以确保实际概率密度与近似的近似值相差最大为a。用户指定的公差/ spl tau /。此外,与两种传统的直方图类型(即等宽和等深)不同,R-ACM既不是等宽也不是等深。从分析上看,我们表明对于R-ACM,扇区内属性值的分布是二项分布。这使我们能够针对概率质量本身的估计误差得出最坏情况和平均情况的结果。我们的理论结果(包括严格的最大似然分析和预期的案例分析)以及大量的实验表明,R-ACM方案(本质上类似于直方图)比用于查询结果大小估计的传统直方图要准确得多。由于它的高准确性和低构建成本,我们希望它可以成为将来数据库系统中查询优化的宝贵工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号