【24h】

Histograms and Wavelets on Probabilistic Data

机译:概率数据的直方图和小波

获取原文

摘要

There is a growing realization that uncertain information is a first-class citizen in modern database management. As such, we need techniques to correctly and efficiently process uncertain data in database systems. In particular, data reduction techniques that can produce concise, accurate synopses of large probabilistic relations are crucial. Similar to their deterministic relation counterparts, such compact probabilistic data synopses can form the foundation for human understanding and interactive data exploration, probabilistic query planning and optimization, and fast approximate query processing in probabilistic database systems. In this paper, we introduce definitions and algorithms for building histogram- and Haar wavelet-based synopses on probabilistic data. The core problem is to choose a set of histogram bucket boundaries or wavelet coefficients to optimize the accuracy of the approximate representation of a collection of probabilistic tuples under a given error metric. For a variety of different error metrics, we devise efficient algorithms that construct optimal or near optimal size B histogram and wavelet synopses. This requires careful analysis of the structure of the probability distributions, and novel extensions of known dynamic programming-based techniques for the deterministic domain. Our experiments show that this approach clearly outperforms simple ideas, such as building summaries for samples drawn from the data distribution, while taking equal or less time.
机译:越来越多的人意识到,不确定的信息是现代数据库管理中的头等公民。因此,我们需要技术来正确有效地处理数据库系统中的不确定数据。尤其是,可以产生简洁,准确的大概率关系概要的数据缩减技术至关重要。类似于它们的确定性关系对应物,这种紧凑的概率数据概要可以为人类理解和交互式数据探索,概率查询计划和优化以及概率数据库系统中的快速近似查询处理奠定基础。在本文中,我们介绍了在概率数据上构建基于直方图和基于Haar小波的提要的定义和算法。核心问题是选择一组直方图桶边界或小波系数,以在给定的误差指标下优化概率元组集合的近似表示的准确性。对于各种不同的误差指标,我们设计了有效的算法,可构造最佳或接近最佳尺寸的B直方图和小波提要。这需要仔细分析概率分布的结构,并需要对已知的基于动态编程的确定性领域进行新的扩展。我们的实验表明,这种方法明显胜过简单的想法,例如为从数据分布中提取的样本建立摘要,而所需的时间却相等或更少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号