【24h】

Sketch-Based Multi-query Processing over Data Streams

机译:数据流上基于草图的多查询处理

获取原文
获取原文并翻译 | 示例

摘要

Recent years have witnessed an increasing interest in designing algorithms for querying and analyzing streaming data (i.e., data that is seen only once in a fixed order) with only limited memory. Providing (perhaps approximate) answers to queries over such continuous data streams is a crucial requirement for many application environments; examples include large telecom and IP network installations where performance data from different parts of the network needs to be continuously collected and analyzed. Randomized techniques, based on computing small "sketch" synopses for each stream, have recently been shown to be a very effective tool for approximating the result of a single SQL query over streaming data tuples. In this paper, we investigate the problems arising when data-stream sketches are used to process multiple such queries concurrently. We demonstrate that, in the presence of multiple query expressions, intelligently sharing sketches among concurrent query evaluations can result in substantial improvements in the utilization of the available sketching space and the quality of the resulting approximation error guarantees. We provide necessary and sufficient conditions for multi-query sketch sharing that guarantee the correctness of the result-estimation process. We also prove that optimal sketch sharing typically gives rise to NP-hard questions, and we propose novel heuristic algorithms for finding good sketch-sharing configurations in practice. Results from our experimental study with realistic workloads verify the effectiveness of our approach, clearly demonstrating the benefits of our sketch-sharing methodology.
机译:近年来,目睹了对设计用于查询和分析仅具有有限存储器的流数据(即,以固定顺序仅看到一次的数据)的算法的日益增长的兴趣。对于这样的连续数据流,提供(也许是近似的)查询查询是许多应用程序环境的关键要求。示例包括大型电信和IP网络安装,其中需要不断收集和分析来自网络不同部分的性能数据。最近,基于对每个流计算较小的“草图”提要的随机技术已被证明是一种非常有效的工具,用于近似流数据元组上的单个SQL查询的结果。在本文中,我们调查了使用数据流草图同时处理多个此类查询时出现的问题。我们证明,在存在多个查询表达式的情况下,在并发查询评估之间智能地共享草图可以显着提高可用草图空间的利用率以及所产生的近似误差保证的质量。我们为多查询草图共享提供了必要和充分的条件,以保证结果估计过程的正确性。我们还证明了最佳草图共享通常会引起NP难题,并且我们提出了新颖的启发式算法,以便在实践中找到良好的草图共享配置。我们的实验研究结果和现实的工作量证明了我们方法的有效性,清楚地证明了我们的草图共享方法的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号