首页> 外文期刊>Distributed and Parallel Databases >One size does not fit all: accelerating OLAP workloads with GPUs
【24h】

One size does not fit all: accelerating OLAP workloads with GPUs

机译:一种尺寸不适合所有:使用GPU加速OLAP工作负载

获取原文
获取原文并翻译 | 示例

摘要

GPU has been considered as one of the next-generation platforms for real-time query processing databases. In this paper we empirically demonstrate that the representative GPU databases [e.g., OmniSci (Open Source Analytical Database & SQL Engine,, 2019)] may be slower than the representative in-memory databases [e.g., Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3-11, 2014)] with typical OLAP workloads (with Star Schema Benchmark) even if the actual dataset size of each query can completely fit in GPU memory. Therefore, we argue that GPU database designs should not be one-size-fits-all; a general-purpose GPU database engine may not be well-suited for OLAP workloads without careful designed GPU memory assignment and GPU computing locality. In order to achieve better performance for GPU OLAP, we need to re-organize OLAP operators and re-optimize OLAP model. In particular, we propose the 3-layer OLAP model to match the heterogeneous computing platforms. The core idea is to maximize data and computing locality to specified hardware. We design the vector grouping algorithm for data-intensive workload which is proved to be assigned to CPU platform adaptive. We design the TOP-DOWN query plan tree strategy to guarantee the optimal operation in final stage and pushing the respective optimizations to the lower layers to make global optimization gains. With this strategy, we design the 3-stage processing model (OLAP acceleration engine) for hybrid CPU-GPU platform, where the computing-intensive star-join stage is accelerated by GPU, and the data-intensive grouping & aggregation stage is accelerated by CPU. This design maximizes the locality of different workloads and simplifies the GPU acceleration implementation. Our experimental results show that with vector grouping and GPU accelerated star-join implementation, the OLAP acceleration engine runs 1.9x, 3.05x and 3.92x faster than Hyper, OmniSci GPU and OmniSci CPU in SSB evaluation with dataset of SF = 100.
机译:GPU被认为是实时查询处理数据库的下一代平台之一。在本文中,我们经验证明了代表性GPU数据库[例如,Omnisci(开源分析数据库和SQL引擎,2019)]可能比代表内存数据库[例如,超级(Neumann和Lee,IEEE Data Eng公牛37(1):3-11,2014)使用典型的OLAP工作负载(具有星形模式基准)即使每个查询的实际数据集大小完全适合GPU内存。因此,我们认为GPU数据库设计不应该是一个尺寸适合的;一般目的GPU数据库引擎可能对OLAP工作负载非常适合而无需谨慎设计的GPU内存分配和GPU计算局部性。为了实现GPU OLAP的更好性能,我们需要重新组织OLAP运算符并重新优化OLAP模型。特别是,我们提出了3层OLAP模型以匹配异构计算平台。核心思想是将数据和计算局部最大化到指定的硬件。我们设计了数据密集型工作负载的矢量分组算法,该算法被证明被分配给CPU平台自适应。我们设计自上而下的查询计划树策略,以保证最终阶段的最佳操作,并将各自的优化推向下层以使全局优化增益。通过这种策略,我们设计了用于混合CPU-GPU平台的3级处理模型(OLAP加速引擎),其中计算密集的星期六加速阶段由GPU加速,数据密集型分组和聚合阶段加速中央处理器。该设计最大化不同工作负载的局部性,并简化了GPU加速实现。我们的实验结果表明,通过矢量分组和GPU加速的恒星 - 连接实现,通过SF = 100的数据集,OLAP加速引擎在SSB评估中运行1.9倍,3.05倍和3.92倍。

著录项

  • 来源
    《Distributed and Parallel Databases》 |2020年第4期|995-1037|共43页
  • 作者单位

    Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

    Natl Satellite Meteorol Ctr China Beijing Peoples R China;

    Univ Helsinki Dept Comp Sci Helsinki Finland;

    Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

    Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

    Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    GPU; OLAP; Layered OLAP; Vector grouping; 3-layer OLAP model;

    机译:GPU;OLAP;层状olap;矢量分组;3层OLAP模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号