One size does not fit all: accelerating OLAP workloads with GPUs

Zhang Yansong; Zhang Yu; Lu Jiaheng; Wang Shan; Liu Zhuan; Han Ruichen

首页> 外文期刊>Distributed and Parallel Databases >One size does not fit all: accelerating OLAP workloads with GPUs

【24h】

One size does not fit all: accelerating OLAP workloads with GPUs

机译：一种尺寸不适合所有：使用GPU加速OLAP工作负载

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

GPU has been considered as one of the next-generation platforms for real-time query processing databases. In this paper we empirically demonstrate that the representative GPU databases [e.g., OmniSci (Open Source Analytical Database & SQL Engine,, 2019)] may be slower than the representative in-memory databases [e.g., Hyper (Neumann and Leis, IEEE Data Eng Bull 37(1):3-11, 2014)] with typical OLAP workloads (with Star Schema Benchmark) even if the actual dataset size of each query can completely fit in GPU memory. Therefore, we argue that GPU database designs should not be one-size-fits-all; a general-purpose GPU database engine may not be well-suited for OLAP workloads without careful designed GPU memory assignment and GPU computing locality. In order to achieve better performance for GPU OLAP, we need to re-organize OLAP operators and re-optimize OLAP model. In particular, we propose the 3-layer OLAP model to match the heterogeneous computing platforms. The core idea is to maximize data and computing locality to specified hardware. We design the vector grouping algorithm for data-intensive workload which is proved to be assigned to CPU platform adaptive. We design the TOP-DOWN query plan tree strategy to guarantee the optimal operation in final stage and pushing the respective optimizations to the lower layers to make global optimization gains. With this strategy, we design the 3-stage processing model (OLAP acceleration engine) for hybrid CPU-GPU platform, where the computing-intensive star-join stage is accelerated by GPU, and the data-intensive grouping & aggregation stage is accelerated by CPU. This design maximizes the locality of different workloads and simplifies the GPU acceleration implementation. Our experimental results show that with vector grouping and GPU accelerated star-join implementation, the OLAP acceleration engine runs 1.9x, 3.05x and 3.92x faster than Hyper, OmniSci GPU and OmniSci CPU in SSB evaluation with dataset of SF = 100.

机译：GPU被认为是实时查询处理数据库的下一代平台之一。在本文中，我们经验证明了代表性GPU数据库[例如，Omnisci（开源分析数据库和SQL引擎，2019）]可能比代表内存数据库[例如，超级（Neumann和Lee，IEEE Data Eng公牛37（1）：3-11,2014）使用典型的OLAP工作负载（具有星形模式基准）即使每个查询的实际数据集大小完全适合GPU内存。因此，我们认为GPU数据库设计不应该是一个尺寸适合的;一般目的GPU数据库引擎可能对OLAP工作负载非常适合而无需谨慎设计的GPU内存分配和GPU计算局部性。为了实现GPU OLAP的更好性能，我们需要重新组织OLAP运算符并重新优化OLAP模型。特别是，我们提出了3层OLAP模型以匹配异构计算平台。核心思想是将数据和计算局部最大化到指定的硬件。我们设计了数据密集型工作负载的矢量分组算法，该算法被证明被分配给CPU平台自适应。我们设计自上而下的查询计划树策略，以保证最终阶段的最佳操作，并将各自的优化推向下层以使全局优化增益。通过这种策略，我们设计了用于混合CPU-GPU平台的3级处理模型（OLAP加速引擎），其中计算密集的星期六加速阶段由GPU加速，数据密集型分组和聚合阶段加速中央处理器。该设计最大化不同工作负载的局部性，并简化了GPU加速实现。我们的实验结果表明，通过矢量分组和GPU加速的恒星 - 连接实现，通过SF = 100的数据集，OLAP加速引擎在SSB评估中运行1.9倍，3.05倍和3.92倍。

著录项

来源
《Distributed and Parallel Databases》 |2020年第4期|995-1037|共43页
作者
Zhang Yansong; Zhang Yu; Lu Jiaheng; Wang Shan; Liu Zhuan; Han Ruichen;
展开▼
作者单位

Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

Natl Satellite Meteorol Ctr China Beijing Peoples R China;

Univ Helsinki Dept Comp Sci Helsinki Finland;

Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

Univ China DEKE Lab Renmin Beijing Peoples R China|Renmin Univ China Sch Informat Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU; OLAP; Layered OLAP; Vector grouping; 3-layer OLAP model;

机译：GPU;OLAP;层状olap;矢量分组;3层OLAP模型;

相似文献

外文文献
中文文献
专利

1. Efficient OLAP algorithms on GPU-accelerated Hadoop clusters [J] . Wang Hongzhi, Wang Zheng, Li Ning, Distributed and Parallel Databases . 2019,第4期

机译：GPU加速的Hadoop集群上的高效OLAP算法
2. Gpufit: An open-source toolkit for GPU-accelerated curve fitting [J] . Adrian Przybylski, Bj?rn Thiel, Jan Keller-Findeisen, Scientific reports. . 2017,第1期

机译：GPUFIT：用于GPU加速曲线拟合的开源工具包
3. Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs [J] . Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin, International journal of grid and high performance computing . 2014,第1期

机译：动态调整线程并行度以加快GPGPU上不规则工作负载分布的动态编程
4. Accelerating OLAP Workload on Interconnected FPGAs with Flash Storage [C] . Yoshimi Masato, Kudo Ryu, Oge Yasin, International Symposium on Computing and Networking . 2014

机译：带有闪存的互连FPGA上的OLAP工作加速
5. Scheduling Irregular Workloads on GPUs [D] . Troendle, David 2019

机译：在GPU上调度不规则的工作负载
6. Gpufit: An open-source toolkit for GPU-accelerated curve fitting [O] . Adrian Przybylski, Björn Thiel, Jan Keller-Findeisen, -1

机译：Gpufit：用于GPU加速曲线拟合的开源工具包
7. Gpufit: An open-source toolkit for GPU-accelerated curve fitting. [O] . Przybylski, A., Thiel, B., Keller-Findeisen, J., 2017

机译：Gpufit：用于GpU加速曲线拟合的开源工具包。

One size does not fit all: accelerating OLAP workloads with GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅