【24h】

CORDS

机译:CORDS

获取原文

摘要

The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers---which usually assume that columns are statistically independent---to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. CORDS searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. CORDS can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus primarily on the use of CORDS in query optimization. Specifically, CORDS recommends groups of columns on which to maintain certain simple joint statistics. These "column-group" statistics are then used by the optimizer to avoid naive selectivity estimates based on inappropriate independence assumptions. This approach, because of its simplicity and judicious use of sampling, is relatively easy to implement in existing commercial systems, has very low overhead, and scales well to the large numbers of columns and large table sizes found in real-world databases. Experiments with a prototype implementation show that the use of CORDS in query optimization can speed up query execution times by an order of magnitude. CORDS can be used in tandem with query feedback systems such as the LEO learning optimizer, leveraging the infrastructure of such systems to correct bad selectivity estimates and ameliorating the poor performance of feedback systems during slow learning phases.
机译:可以利用在现实世界中的关系数据库的列中发现的丰富的依赖关系结构,从而发挥很大的优势,但也会导致查询优化器(通常假定列在统计上是独立的),从而低估了联合谓词按顺序的选择性。数量级。我们介绍了CORDS,这是一种高效且可扩展的工具,用于自动发现列之间的相关性和软功能依赖性。 CORDS通过系统地枚举候选对并使用一组灵活的启发式方法同时修剪没有希望的候选,来搜索可能具有有趣和有用的依赖关系的列对。为了确定相关性,将鲁棒的卡方分析应用于列值的样本,并分析采样列中不同值的数量以检测软功能依赖性。 CORDS可以用作数据挖掘工具,生成内在关注的依赖图。我们主要专注于在查询优化中使用CORDS。具体来说,CORDS建议使用列组来维护某些简单的联合统计信息。然后,优化器将使用这些“列组”统计信息来避免基于不适当的独立性假设进行幼稚的选择性估计。由于这种方法简单易行,并且明智地使用了采样,因此在现有的商业系统中相对容易实现,开销非常低,并且可以很好地缩放以适应实际数据库中的大量列和大表大小。使用原型实现的实验表明,在查询优化中使用CORDS可以将查询执行时间缩短一个数量级。 CORDS可以与诸如LEO学习优化器之类的查询反馈系统一起使用,利用此类系统的基础结构来纠正不良的选择性估计,并缓解缓慢学习阶段中反馈系统的不良性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号