首页> 外文会议>SIGMOD/PODS >Progressive Optimization in a Shared-Nothing Parallel Database
【24h】

Progressive Optimization in a Shared-Nothing Parallel Database

机译:共享除了并行数据库中的渐进优化

获取原文
获取外文期刊封面目录资料

摘要

Commercial enterprise data warehouses are typically implemented on parallel databases due to the inherent scalability and performance limitation of a serial architecture. Queries used in such large data warehouses can contain complex predicates as well as multiple joins, and the resulting query execution plans generated by the optimizer may be suboptimal due to mis-estimates of row cardinalities. Progressive optimization (POP) is an approach to detect cardinality estimation errors by monitoring actual cardinalities at runtime and to recover by triggering re-optimization with the actual cardinalities measured. However, the original serial POP solution is based on a serial processing architecture, and the core ideas cannot be readily applied to a parallel shared-nothing environment. Extending the serial POP to a parallel environment is a challenging problem since we need to determine when and how we can trigger re-optimization based on cardinalities collected from multiple independent nodes. In this paper, we present a comprehensive and practical solution to this problem, including several novel voting schemes whether to trigger re-optimization, a mechanism to reuse local intermediate results across nodes as a partitioned materialized view, several flavors of parallel checkpoint operators, and parallel checkpoint processing methods using efficient communication protocols. This solution has been prototyped in a leading commercial parallel DBMS.We have performed extensive experiments using the TPC-H benchmark and a real-world database. Experimental results show that our solution has negligible runtime overhead and accelerates the performance of complex OLAP queries by up to a factor of 22.
机译:由于串行架构的固有可扩展性和性能限制,商业企业数据仓库通常在并行数据库上实现。在这种大数据仓库中使用的查询可以包含复杂的谓词以及多个连接,并且由于行基数的错误估计,优化器产生的所得到的查询执行计划可能是次优。渐进优化(POP)是通过在运行时监测实际基数和通过用测量的实际基数来触发重新优化来恢复的方法来检测基数估计误差的方法。但是,原始串行POP解决方案基于串行处理架构,并且无法容易地应用于并行共享无线环境的核心思路。将串行流行延伸到并行环境是一个具有挑战性的问题,因为我们需要根据从多个独立节点收集的基数来确定何时以及如何触发重新优化。在本文中,我们对这个问题提供了全面实际的解决方案,包括一些新的投票方案,包括触发重新优化的机制,以作为分区的物化视图重用本地中间结果的机制,以及并行检查点运算符的几种味道,以及使用高效通信协议的并行检查点处理方法。该解决方案已在领先的商业平行DBMS中进行了原型。我们使用TPC-H基准和现实世界数据库进行了广泛的实验。实验结果表明,我们的解决方案可以忽略不计的运行时间开销,并加速复合OLAP查询的性能长达22倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号