...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Scaling the Construction of Wavelet Synopses for Maximum Error Metrics
【24h】

Scaling the Construction of Wavelet Synopses for Maximum Error Metrics

机译:扩展小波概要的构造以获取最大误差指标

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Modern analytics involve computations over enormous numbers of data records. The volume of data and the stringent response-time requirements place increasing emphasis on the efficiency of approximate query processing. A major challenge over the past years has been the construction of synopses that provide a deterministic quality guarantee, often expressed in terms of a maximum error metric. By approximating sharp discontinuities, wavelet decomposition has proved to be a very effective tool for data reduction. However, existing wavelet thresholding schemes that minimize maximum error metrics are constrained with impractical complexities for large datasets. Furthermore, they cannot efficiently handle the multi-dimensional version of the problem. In order to provide a practical solution, we develop parallel algorithms that take advantage of key-properties of the wavelet decomposition and allocate tasks to multiple workers. To that end, we present (i) a general framework for the parallelization of existing dynamic programming algorithms, (ii) a parallel version of one such DP algorithm, and (iii) two highly efficient distributed greedy algorithms that can deal with data of arbitrary dimensionality. Our extensive experiments on both real and synthetic datasets over Hadoop show that the proposed algorithms achieve linear scalability and superior running-time performance compared to their centralized counterparts.
机译:现代分析涉及对大量数据记录的计算。数据量和严格的响应时间要求越来越强调近似查询处理的效率。过去几年中的主要挑战是提要的构建,这些提要提供确定性的质量保证,通常以最大误差度量表示。通过逼近尖锐的不连续点,小波分解已被证明是用于数据缩减的非常有效的工具。但是,现有的将最大误差度量最小化的小波阈值方案受到大型数据集不切实际的复杂性的约束。此外,他们无法有效处理问题的多维版本。为了提供一个实用的解决方案,我们开发了并行算法,这些算法利用了小波分解的关键属性,并将任务分配给多个工作人员。为此,我们提出了(i)现有动态编程算法并行化的通用框架,(ii)一种此类DP算法的并行版本,以及(iii)可以处理任意数据的两种高效分布式贪婪算法维度。我们在Hadoop上对真实和合成数据集进行的广泛实验表明,与集中式算法相比,所提出的算法可实现线性可扩展性和出色的运行时性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号