首页> 外文会议>ACM SIGMOD International Conference on Management of Data >Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets
【24h】

Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets

机译:使用小波的稀疏数据多维聚合的近似计算

获取原文

摘要

Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries. In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a small number of I/Os, depending upon the desired accuracy. We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.
机译:计算高维度的多维聚集体是许多OLAP应用的性能瓶颈。在数据仓库环境中的时间和/或存储空间中获取精确答案可以对汇总昂贵。对于OLAP聚合查询具有快速,近似答案是有利的。在本文中,我们提出了一种新的方法,其在大规模稀疏数据集中提供近似答案以时间效率和空间有效的方式。我们构建一个紧凑的数据多维数据集,这是基于多阵容小波分解的底层多维阵列的近似和空间高效的表示。在在线阶段中,根据所需的精度,通常可以使用一个I / O中的紧凑数据多维数据集或少量I / O中的每个聚合查询来回答每个聚合查询。我们展示了两个I / O高效算法,以构建紧凑的数据立方体,以实现稀疏高维数阵列的重要情况,这通常在实践中出现。传统的直方图方法对于OLAP应用中的大规模高维数据集是不可行的。以前开发的小波技术仅用于密集数据。我们的在线查询处理算法非常快,并且能够在用户要求更准确的情况下更精炼答案。真实数据的实验表明,我们的方法对典型的OLAP聚合查询提供了比其他有效近似技术(如随机采样)提供更准确的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号