【24h】

Optimal and Approximate Computation of Summary Statistics for Range Aggregates

机译:范围汇总的摘要统计量的最佳和近似计算

获取原文
获取原文并翻译 | 示例

摘要

Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on "selectivity estimation", that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries. First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are prov-ably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries). Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for picking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property. We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.
机译:聚合查询的快速估计在数据库查询优化,近似查询答复和在线查询处理中很有用。因此,人们一直非常关注“选择性估计”,即计算基础数据的摘要统计信息,并使用该统计信息来快速汇总汇总查询并达到合理的近似值。我们提供了范围聚合查询的两组结果,这是最常见的查询之一。首先,我们专注于直方图作为摘要统计数据,并介绍用于构建对于范围查询可证明是最佳(或可证明是近似)的直方图的算法;这些算法需要(伪)多项式时间。这些是任意范围查询的第一个已知的最优性或近似结果。先前已知的结果仅对有限范围的查询(例如相等性查询,分层或前缀范围查询)是最佳的。其次,我们将重点放在基于小波的表示形式上作为摘要统计信息,并提出用于选择小波统计信息的快速算法,该算法被证明是范围查询的最佳选择。没有以前已知的基于小波的方法具有此属性。我们对各种摘要表示进行了实验研究,显示了我们的算法比已知方法的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号