首页> 外文期刊>The European Physical Journal Special Topics >Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers
【24h】

Porting and scaling OpenACC applications on massively-parallel, GPU-accelerated supercomputers

机译:在大规模并行,GPU加速的超级计算机上移植和扩展OpenACC应用程序

获取原文
获取原文并翻译 | 示例
       

摘要

An increasing number of massively-parallel supercomputers are based on heterogeneous node architectures combining traditional, powerful multicore CPUs with energy-efficient GPU accelerators. Such systems offer high computational performance with modest power consumption. As the industry trend of closer integration of CPU and GPU silicon continues, these architectures are a possible template for future exascale systems. Given the longevity of large-scale parallel HPC applications, it is important that there is a mechanism for easy migration to such hybrid systems. The OpenACC programming model offers a directive-based method for porting existing codes to run on hybrid architectures. In this paper, we describe our experiences in porting the Himeno benchmark to run on the Cray XK6 hybrid supercomputer. We describe the OpenACC programming model and the changes needed in the code, both to port the functionality and to tune the performance. Despite the additional PCIe-related overheads when transferring data from one GPU to another over the Cray Gemini interconnect, we find the application gives very good performance and scales well. Of particular interest is the facility to launch OpenACC kernels and data transfers asynchronously, which speeds the Himeno benchmark by 5%–10%. Comparing performance with an optimised code on a similar CPU-based system (using 32 threads per node), we find the OpenACC GPU version to be just under twice the speed in a node-for-node comparison. This speed-up is limited by the computational simplicity of the Himeno benchmark and is likely to be greater for more complicated applications.
机译:越来越多的大规模并行超级计算机基于异构节点架构,将传统功能强大的多核CPU与节能GPU加速器相结合。这样的系统以适度的功耗提供了高计算性能。随着CPU和GPU芯片更紧密集成的行业趋势持续发展,这些体系结构可能成为未来百亿亿次系统的模板。鉴于大型并行HPC应用程序的使用寿命长,重要的是要有一种易于迁移到此类混合系统的机制。 OpenACC编程模型提供了一种基于指令的方法,用于移植现有代码以在混合体系结构上运行。在本文中,我们描述了将Himeno基准移植到Cray XK6混合超级计算机上运行的经验。我们将介绍OpenACC编程模型以及代码中需要进行的更改,以移植功能并调整性能。通过Cray Gemini互连将数据从一个GPU传输到另一个GPU时,尽管存在与PCIe相关的额外开销,但我们发现该应用程序具有非常好的性能,并且可以很好地扩展。特别令人感兴趣的是可以异步启动OpenACC内核和数据传输的工具,它使Himeno基准测试速度提高了5%–10%。将性能与类似的基于CPU的系统上的优化代码进行比较(每个节点使用32个线程),我们发现OpenACC GPU版本的速度仅是节点对节点比较速度的两倍。这种加速受到Himeno基准程序的计算简单性的限制,对于更复杂的应用程序可能会更大。

著录项

  • 来源
  • 作者

    A. Hart; R. Ansaloni; A. Gray;

  • 作者单位

    Cray Exascale Research Initiative Europe King’s Buildings Edinburgh EH9 3JZ UK;

    Cray Italy S.r.l. via Motta 10 20144 Milano Italy;

    EPCC The University of Edinburgh King’s Buildings Edinburgh EH9 3JZ UK;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号