首页> 外文会议> >Approximate computation of multidimensional aggregates of sparse data using wavelets
【24h】

Approximate computation of multidimensional aggregates of sparse data using wavelets

机译:使用小波近似计算稀疏数据的多维集合

获取原文

摘要

Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries.

In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a smalll number of I/Os, depending upon the desired accuracy.

We present two I/O-efficient algorithms to construct the compact data cubefor the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.

机译:

以高维计算多维聚合是许多OLAP应用程序的性能瓶颈。就数据仓库环境中的时间和/或存储空间而言,获得对聚合查询的确切答案可能会非常昂贵。快速,近似地回答OLAP聚合查询是有利的。

在本文中,我们提出了一种新颖的方法,该方法以时效和空间高效的方式为海量稀疏数据集中的高维OLAP聚合查询提供了近似答案。基于多分辨率小波分解,我们构造了紧凑数据立方体,它是基础多维数组的近似且节省空间的表示形式。在联机阶段,通常可以在一个I / O或少量I / O中使用紧凑型数据立方体来回答每个聚合查询,具体取决于所需的准确性。

对于在实际中经常出现的稀疏高维数组的重要案例,我们提出了两种I / O高效算法来构造紧凑数据立方体。对于OLAP应用程序中的大量高维数据集,传统的直方图方法是行不通的。先前开发的小波技术仅对密集数据有效。我们的在线查询处理算法非常快速,并且能够随着用户要求更高的准确性来完善答案。实际数据实验表明,与其他有效的近似技术(例如随机抽样)相比,该方法为典型的OLAP聚合查询提供的结果要准确得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号