Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers

A. Hart; R. Ansaloni; A. Gray

首页> 外文期刊>The European Physical Journal Special Topics >Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers

【24h】

Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers

机译：在大规模并行，GPU加速的超级计算机上移植和扩展OpenACC应用程序

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An increasing number of massively-parallel supercomputers are based on heterogeneous node architectures combining traditional, powerful multicore CPUs with energy-efficient GPU accelerators. Such systems offer high computational performance with modest power consumption. As the industry trend of closer integration of CPU and GPU silicon continues, these architectures are a possible template for future exascale systems. Given the longevity of large-scale parallel HPC applications, it is important that there is a mechanism for easy migration to such hybrid systems. The OpenACC programming model offers a directive-based method for porting existing codes to run on hybrid architectures. In this paper, we describe our experiences in porting the Himeno benchmark to run on the Cray XK6 hybrid supercomputer. We describe the OpenACC programming model and the changes needed in the code, both to port the functionality and to tune the performance. Despite the additional PCIe-related overheads when transferring data from one GPU to another over the Cray Gemini interconnect, we find the application gives very good performance and scales well. Of particular interest is the facility to launch OpenACC kernels and data transfers asynchronously, which speeds the Himeno benchmark by 5%–10%. Comparing performance with an optimised code on a similar CPU-based system (using 32 threads per node), we find the OpenACC GPU version to be just under twice the speed in a node-for-node comparison. This speed-up is limited by the computational simplicity of the Himeno benchmark and is likely to be greater for more complicated applications.

机译：越来越多的大规模并行超级计算机基于异构节点架构，将传统功能强大的多核CPU与节能GPU加速器相结合。这样的系统以适度的功耗提供了高计算性能。随着CPU和GPU芯片更紧密集成的行业趋势持续发展，这些体系结构可能成为未来百亿亿次系统的模板。鉴于大型并行HPC应用程序的使用寿命长，重要的是要有一种易于迁移到此类混合系统的机制。 OpenACC编程模型提供了一种基于指令的方法，用于移植现有代码以在混合体系结构上运行。在本文中，我们描述了将Himeno基准移植到Cray XK6混合超级计算机上运行的经验。我们将介绍OpenACC编程模型以及代码中需要进行的更改，以移植功能并调整性能。通过Cray Gemini互连将数据从一个GPU传输到另一个GPU时，尽管存在与PCIe相关的额外开销，但我们发现该应用程序具有非常好的性能，并且可以很好地扩展。特别令人感兴趣的是可以异步启动OpenACC内核和数据传输的工具，它使Himeno基准测试速度提高了5％–10％。将性能与类似的基于CPU的系统上的优化代码进行比较（每个节点使用32个线程），我们发现OpenACC GPU版本的速度仅是节点对节点比较速度的两倍。这种加速受到Himeno基准程序的计算简单性的限制，对于更复杂的应用程序可能会更大。

著录项

来源
《The European Physical Journal Special Topics》 |2012年第1期|5-16|共12页
作者
A. Hart; R. Ansaloni; A. Gray;
展开▼
作者单位

Cray Exascale Research Initiative Europe King’s Buildings Edinburgh EH9 3JZ UK;

Cray Italy S.r.l. via Motta 10 20144 Milano Italy;

EPCC The University of Edinburgh King’s Buildings Edinburgh EH9 3JZ UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers [J] . Hart A., Ansaloni R., Gray A. The European physical journal: Special topics . 2012,第Null期

机译：在大规模并行，GPU加速的超级计算机上移植和扩展OpenACC应用程序
2. Affordable and accurate large-scale hybrid-functional calculations on GPU-accelerated supercomputers [J] . Ratcliff Laura E., Degomme A., Flores-Livas Jose A., Journal of Physics. Condensed Matter . 2018,第30期

机译：GPU加速超级计算机的实惠和准确的大型混合功能计算
3. Affordable and accurate large-scale hybrid-functional calculations on GPU-accelerated supercomputers [J] . Ratcliff Laura E., Degomme A., Flores-Livas Jose A., Journal of Physics. Condensed Matter . 2018,第9期

机译：GPU加速超级计算机的实惠和准确的大型混合功能计算
4. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers [C] . Maruyama Naoya, Sato Kento, Nomura Tatsuo, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis . 2011

机译：Physis：用于大型GPU加速的超级计算机上模版计算的隐式并行编程模型
5. GPU-Accelerated Discontinuous Galerkin Methods on Hybrid Meshes: Applications in Seismic Imaging [D] . Wang, Zheng. 2017

机译：混合网格上GPU加速的不连续Galerkin方法：在地震成像中的应用
6. Complete Real-Scale Application of Recycled Aggregates in a Port Loading Platform in Huelva Spain [O] . Francisco Agrela, Francisco González-Gallardo, Julia Rosales, 2020

机译：完全实际应用在韦尔瓦西班牙港口装载平台中回收综合体
7. Porting Ordinary Applications to Blue Gene/Q Supercomputers [O] . Ketan Maheshwari, Justin M. Wozniak, Timothy G. Armstrong, 2015

机译：将普通应用程序移植到Blue Gene / Q超级计算机

Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers

摘要

著录项

相似文献

相关主题

期刊订阅