【24h】

Universality of Serial Histograms

机译:连续直方图的普遍性

获取原文

摘要

Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and demonstrate that they are optimal for reducing the query result size error for several classes of queries when the actual query result size (and hence the value of that error) reaches some extreme. Specifically, serial histograms are shown to be optimal for arbitrary tree equality-join queries when the query result size is maximized, whether or not the attribute independence assumption holds, and when the query result size is minimized and the attribute independence assumption holds. We also show that the expected error for any such query is always zero under all histograms, and thus argue that histograms should be chosen based on the reduction of the extreme-cases error, since reduction of the expected error is meaningless.
机译:许多当前关系数据库系统使用某种形式的直方图来近似关系属性中值的频率分布,并基于它们估计查询结果大小和接入计划成本。直方图近似值中存在的错误直接或过分地影响数据库系统导出的许多估计。我们识别串行直方图的类,并证明它们是最佳的,用于减少当实际查询结果大小(并且因此该错误的值)达到一些极端时的查询结果大小错误。具体地,当查询结果大小最大化时,串行直方图是最佳的,任意树等 - 连接查询,无论是属性的独立性假设是否保持,并且当查询结果大小最小化并且属性独立性假设时保持。我们还表明,在所有直方图下,任何此类查询的预期误差始终为零,因此应根据缩小误差的减少来选择直方图,因为降低预期误差是毫无意义的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号