首页> 外文会议>International Conference on Data Engineering >Approximating a data stream for querying and querying and estimation: algorithms and performance evaluation
【24h】

Approximating a data stream for querying and querying and estimation: algorithms and performance evaluation

机译:近似用于查询和查询和估计的数据流:算法和性能评估

获取原文

摘要

Obtaining fast and good quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications including, approximate querying, similarity searching and data mining in most application domains, rely on such good quality approximations. Histogram based approximation is a very popular method in database theory and practice to succinctly represent a data distribution in a space efficient manner. In this paper, we place the problem of histogram construction into perspective and we generalize it by raising the requirement of a finite data set and/or known data set size. We consider the case of an infinite data set on which data arrive continuously forming an infinite data stream. In this context, we present the first single pass algorithms capable of constructing histograms of provable good quality. We present algorithms for the fixed window variant of the basic histogram construction problem, supporting incremental maintenance of the histograms. The proposed algorithms trade accuracy for speed and allow for a graceful tradeoff between the two, based on application requirements. In the case of approximate queries on infinite data streams, we present a detailed experimental evaluation comparing our algorithms with other applicable techniques using real data sets, demonstrating the superiority of our proposal.
机译:获得数据分布的快速和良好的质量近似是数据库管理的核心兴趣问题。各种流行的数据库应用程序,包括大多数应用域中的近似查询,相似性搜索和数据挖掘,依赖于如此优质的近似值。基于直方图的近似是数据库理论中的一种非常流行的方法,并且实践以简要地表示以空间有效的方式表示数据分布。在本文中,我们将直方图构造的问题分为透视图,我们通过提高有限数据集和/或已知数据集大小的要求概括它。我们考虑无限数据集的情况,其中数据到达连续形成无限数据流。在这种情况下,我们介绍了能够构建可提供优质质量的直方图的第一单通算法。我们提供了基本直方图构建问题的固定窗口变体的算法,支持直方图的增量维护。基于应用要求,所提出的算法速度的速度和允许在两者之间进行优雅权衡。在Infinite数据流的近似查询的情况下,我们提供了一个详细的实验评估,将我们的算法与其他适用技术进行比较,使用真实数据集,展示了我们提案的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号