首页> 外文期刊>International journal of parallel programming >Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
【24h】

Panda: A Compiler Framework for Concurrent CPU+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

机译:Panda:在GPU加速的超级计算机上同时执行3D模具计算的CPU + GPU执行的编译器框架

获取原文
获取原文并翻译 | 示例

摘要

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of directives, sequential stencil C codes can be automatically parallelized for large-scale GPU clusters. The most distinctive feature of the compiler is its capability to generate hybrid MPI+CUDA+OpenMP code that uses concurrent CPU+GPU computing to unleash the full potential of powerful GPU clusters. The auto-generated hybrid codes hide the overhead of various data motion by overlapping them with computation. Test results on the Titan supercomputer and the Wilkes cluster show that auto-translated codes can achieve about 90% of the performance of highly optimized handwritten codes, for both a simple stencil benchmark and a real-world application in cardiac modeling. The user-friendliness and performance of our domain-specific compiler framework allow harnessing the full power of GPU-accelerated supercomputing without painstaking coding effort.
机译:我们提出了一个新的编译器框架,用于在GPU群集上实现真正的异构3D模具计算。我们的框架包括一个简单的基于指令的编程模型和一个紧密集成的源到源编译器。带有少量指令的注释,顺序模板C代码可以针对大型GPU集群自动并行化。编译器最独特的功能是它能够生成MPI + CUDA + OpenMP混合代码,该代码使用并发CPU + GPU计算来释放强大GPU集群的全部潜能。自动生成的混合代码通过与计算重叠来隐藏各种数据运动的开销。在Titan超级计算机和Wilkes群集上的测试结果表明,对于简单的模版基准和实际在心脏建模中的应用,自动翻译的代码可以达到高度优化的手写代码的90%的性能。我们特定于域的编译器框架的用户友好性和性能允许您利用GPU加速的超级计算的全部功能,而无需付出任何努力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号