GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

Clay M. P.; Buaria D.; Yeung P. K.; Gotoh T.

首页> 外文期刊>Computer physics communications >GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

【24h】

GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

机译：使用OpenMP 4.5加速高施密特号码的湍流混合的GPU加速

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes. (C) 2018 Elsevier B.V. All rights reserved.

机译：本文报道了大规模平行GPU加速算法的成功实施，用于高施密特数的湍流混合的直接数值模拟。该工作源于最近的开发（计算。物理。Communce。，Vol.219,2017,313-328），其中显示了在重叠通信时在CRAY XE6架构上获得高度可扩展性的低通信算法通过专用通信线程计算。现在已经在CRAY XK7架构上使用OpenMP 4.5实现了更高水平的性能，每个节点在每个节点上都有16个Interlaragos处理器的16个整数内核共享单个NVIDIA K20x GPU加速器。在新算法中，通过在GPU上的组合紧凑的有限差（CCD）操作的形式的几乎所有密集的标量场计算来最小化数据移动。发现在常规实践中出发的存储器布局为应用CCD方案所需的特定内核提供更好的性能。通过将OpenMP 4.5 Nowait子句添加到目标构造的异步执行可提高可伸缩性，以便在GPU上与CPU上的计算和通信重叠GPU时。在USA橡树岭国家实验室的27-Petaflops超级计算机泰坦，在用8192 XK7节点计算的标量场的最大问题大小为81923网点的最大问题大小，始终观察到大约5的GPU-to-CPU加速度。（c）2018 Elsevier B.v.保留所有权利。

著录项

来源
《Computer physics communications》 |2018年第2018期|共15页
作者
Clay M. P.; Buaria D.; Yeung P. K.; Gotoh T.;
展开▼
作者单位

Air Force Res Lab Munit Directorate Eglin AFB FL 32542 USA;

Max Planck Inst Dynam &

Selforg D-37077 Gottingen Germany;

Georgia Inst Technol Sch Aerosp Engn Atlanta GA 30332 USA;

Nagoya Inst Technol Dept Phys Sci &

Engn Nagoya Aichi Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算机的应用;
关键词
Turbulence; High Schmidt number; Compact finite differences; Asynchronous GPU computing; OpenMP 4.5; Titan (ORNL);

机译：湍流;高施密特数;紧凑的有限差异;异步GPU计算;Openmp 4.5;泰坦（ornl）;

相似文献

外文文献
中文文献
专利

1. GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5 [J] . Clay M. P., Buaria D., Yeung P. K., Computer physics communications . 2018,第期

机译：使用OpenMP 4.5加速高施密特号码的湍流混合的GPU加速
2. APS -70th Annual Meeting of the APS Division of Fluid Dynamics- Event - A multithreaded and GPU-optimized compact finite difference algorithm for turbulent mixing at high Schmidt number using petascale computing [J] . M. P. Clay, P. K. Yeung, D. Buaria, Bulletin of the American Physical Society . 2017,第14期

机译：APS-流体动力学APS部门第70届年会-事件-一种多线程和GPU优化的紧凑有限差分算法，用于使用皮氏计算在高Schmidt数下进行湍流混合
3. A dual communicator and dual grid-resolution algorithm for petascale simulations of turbulent mixing at high Schmidt number [J] . Clay M. P., Buaria D., Gotoh T., Computer physics communications . 2017,第期

机译：高施密特数湍流混合促成吐振模拟的双通信器和双电网分辨率算法
4. The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs [C] . Matt Martineau, Simon Mcintosh-Smith International Workshop on OpenMP . 2017

机译：适用于针对英特尔CPU，IBM CPU和NVIDIA GPU的科学应用的OpenMP 4.5的生产率，可移植性和性能
5. Turbulent mixing of passive scalars at high Schmidt number. [D] . Xu, Shuyi. 2005

机译：高施密特数时被动标量的湍流混合。
6. High Performance Data Clustering: A Comparative Analysis of Performance for GPU RASC MPI and OpenMP Implementations [O] . Luobin Yang, Steve C. Chiu, Wei-Keng Liao, -1

机译：高性能数据集群：GPURASCMPI和OpenMP实现的性能比较分析
7. The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs [O] . Martineau, Matt, McIntosh-Smith, Simon 2018

机译：适用于针对Intel CPU，IBM CPU和NVIDIA GPU的科学应用的OpenMP 4.5的生产率，可移植性和性能

GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅