首页> 外文会议>SIGMOD/PODS 2007 >Sharing Aggregate Computation for Distributed Queries
【24h】

Sharing Aggregate Computation for Distributed Queries

机译:共享分布式计算的分布式查询

获取原文

摘要

An emerging challenge in modern distributed querying is to effi- ciently process multiple continuous aggregation queries simultaneously. Processing each query independently may be infeasible, so multi-query optimizations are critical for sharing work across queries. The challenge is to identify overlapping computations that may not be obvious in the queries themselves. In this paper, we reveal new opportunities for sharing work in the context of distributed aggregation queries that vary in their selection predicates. We identify settings in which a large set of q such queries can be answered by executing k《 q different queries. The k queries are revealed by analyzing a boolean matrix capturing the connection between data and the queries that they satisfy, in a manner akin to familiar techniques like Gaussian elimination. Indeed, we identify a class of linear aggregate functions (including SUM, COUNT and AVERAGE), and show that the sharing potential for such queries can be optimally recovered using standard matrix decompositions from computational linear algebra. For some other typical aggregation functions (including MIN and MAX) we find that optimal sharing maps to the NP-hard set basis problem. However, for those scenarios, we present a family of heuristic algorithms and demonstrate that they perform well for moderate-sized matrices. We also present a dynamic distributed system architecture to exploit sharing opportunities, and experimentally evaluate the benefits of our techniques via a novel, flexible random workload generator we develop for this setting.
机译:现代分布式查询中的一个新挑战是有效地同时处理多个连续聚合查询。独立处理每个查询可能是不可行的,因此多查询优化对于跨查询共享工作至关重要。挑战在于识别在查询本身中可能不明显的重叠计算。在本文中,我们揭示了在分布式聚合查询的选择谓词有所不同的情况下共享工作的新机会。我们确定设置,通过执行k《 q个不同的查询,可以回答大量的此类查询。通过分析一个布尔矩阵来揭示第k个查询,该布尔矩阵捕获数据和它们满足的查询之间的联系,其方式类似于诸如高斯消除之类的熟悉技术。实际上,我们确定了一类线性聚合函数(包括SUM,COUNT和AVERAGE),并表明可以使用计算线性代数中的标准矩阵分解来最佳地恢复此类查询的共享潜力。对于其他一些典型的聚合函数(包括MIN和MAX),我们发现最优共享映射到NP硬集基础问题。但是,对于这些情况,我们提出了一系列启发式算法,并证明了它们在中等大小的矩阵中表现良好。我们还提出了一种动态的分布式系统架构,以利用共享机会,并通过针对该环境开发的新颖,灵活的随机工作负载生成器,通过实验评估我们的技术的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号