Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Mohammed Sourouri; Scott B. Baden; Xing Cai

首页> 外文期刊>International journal of parallel programming >Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

【24h】

Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

机译：Panda：在GPU加速的超级计算机上同时执行3D模具计算的CPU + GPU执行的编译器框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPI+CUDA+OpenMP code that uses concurrent CPU+GPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90% of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.

机译：我们提出了一个新的编译器框架，用于在GPU群集上实现真正的异构3D模具计算。我们的框架包括一个简单的基于指令的编程模型和一个紧密集成的源到源编译器。带有少量指令的注释，顺序模板C代码可以针对大型GPU集群自动并行化。编译器最独特的功能是它能够生成MPI + CUDA + OpenMP混合代码，该代码使用并发CPU + GPU计算来释放强大GPU集群的全部潜能。自动生成的混合代码通过与计算重叠来隐藏各种数据运动的开销。在Titan超级计算机和Wilkes群集上的测试结果表明，对于简单的模版基准和实际在心脏建模中的应用，自动翻译的代码可以达到高度优化的手写代码的90％的性能。我们特定于域的编译器框架的用户友好性和性能允许您利用GPU加速的超级计算的全部功能，而无需付出任何努力。

著录项

来源
《International journal of parallel programming》 |2017年第3期|711-729|共19页
作者
Mohammed Sourouri; Scott B. Baden; Xing Cai;
展开▼
作者单位

Simula Research Laboratory, Oslo, Norway,Department of Informatics, University of Oslo, Oslo, Norway;

Department of Computer Science and Engineering, University of California, San Diego, CA, USA;

Simula Research Laboratory, Oslo, Norway,Department of Informatics, University of Oslo, Oslo, Norway;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Source-to-source translation; Code generation; Code optimization; CUDA; OpenMP; MPI; Stencil computation; Heterogeneous computing; CPU+GPU computing;

机译：源到源翻译;代码生成;代码优化;CUDA;OpenMP;MPI;模具计算;异构计算;CPU + GPU计算;

相似文献

外文文献
中文文献
专利

1. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers [J] . Sergei Gorlatch Computing reviews . 2017,第11期

机译：在GPU加速的超级计算机上基于编译器的代码生成和几何多网格自动调整
2. Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers [J] . Basu Protonu, Williams Samuel, Van Straalen Brian, Parallel Computing . 2017,第May期

机译：在GPU加速的超级计算机上基于编译器的代码生成和几何多网格自动调整
3. Compiling concurrent programs for embedded sequential execution [J] . Bill Lin Integration . 2007,第2期

机译：编译并发程序以进行嵌入式顺序执行
4. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers [C] . Maruyama Naoya, Sato Kento, Nomura Tatsuo, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis . 2011

机译：Physis：用于大型GPU加速的超级计算机上模版计算的隐式并行编程模型
5. Integrating concurrency control and proxy execution support and provide a framework for deterministic concurrency testing under the KURT-Linux group scheduling model. [D] . Aswathanarayana, Tejasvi. 2006

机译：集成了并发控制和代理执行支持，并为KURT-Linux组调度模型下的确定性并发测试提供了框架。
6. Correction: A Computational Framework for 3D Mechanical Modeling of Plant Morphogenesis with Cellular Resolution [O] . 2016

机译：更正：具有细胞分辨率的植物形态发生3D机械建模的计算框架
7. Vis3D+ A tightly integrated GPU-accelerated computation and rendering framework for interactive 3D image visualization [O] . Irfa Nisar 2021

机译：Vis3D + Interactive 3D图像可视化的紧密集成的GPU加速计算和渲染框架

Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅