A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers

Guillermo Oyarzun; Iason A. Chalmoukis; Georgios A. Leftheriotis; Athanassios A. Dimas

首页> 外文期刊>Applied Mathematical Modelling >A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers

【24h】

A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers

机译：一种基于GPU的高雷诺数数流动在异构CPU / GPU超级计算机上的GPU算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

An optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems using large-eddy simulation is presented. The code was validated for the simulation of wave boundary-layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform-based solver was developed for the solution of the Poisson equation for pressure taking advantage of the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the overhead of performing MP1 communications using GPUs. As a result, the weak scaling of the algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 × 10～5) using a grid of 4 × 10～8 cells was executed, and the performance of the code was analyzed. The simulation was launched using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of the world (Piz Daint). A comparison of the overall computational time showed that the GPU version was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of modern supercomputing capabilities.

机译：提出了一种优化的MPI + OPECACC实现模型，其在使用大涡模拟中的CPU / GPU系统中有效执行。用于对文献中的数值和实验数据进行波边界流的模拟验证了代码。开发了一种基于快速的傅立叶变换的求解器，用于泊松方程，用于利用周期性边界条件的压力。该解算器优化了CPU中的并行执行，并且在计算时间在GPU中的典型迭代预处理缀合物梯度求解器在10次上进行了10倍。在并行性能方面，开发了重叠策略以减少使用GPU执行MP1通信的开销。结果，算法的弱比例高达30％。最后，执行了使用4×10〜8个单元格网格的大规模模拟（RE = 2×10〜5），分析了代码的性能。在世界上最多512个节点（512 GPU + 6144 CPU-CPU-COM-CPU-COSE）上推出了模拟，其中一个世界上十大超级计算机（PIZ DAINT）之一。整体计算时间的比较显示，GPU版本比CPU速度快4.2倍。与最先进的CPU实现相比，该策略（47％）的并行效率与最先进的CPU实现相比，它有可能利用现代超级计算能力。

著录项

来源
《Applied Mathematical Modelling》 |2020年第9期|141-156|共16页
作者
Guillermo Oyarzun; Iason A. Chalmoukis; Georgios A. Leftheriotis; Athanassios A. Dimas;
展开▼
作者单位

Barcelona Supercomputing Center 08034 Barcelona Spain;

Laboratory of Hydraulic Engineering Department of Civil Engineering University of Patras 26500 Patras Greece;

Laboratory of Hydraulic Engineering Department of Civil Engineering University of Patras 26500 Patras Greece;

Laboratory of Hydraulic Engineering Department of Civil Engineering University of Patras 26500 Patras Greece;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
OpenACC; GPU architectures; MPI; LES; High Reynolds number flows;

机译：openacc;GPU架构;MPI;les;高雷诺数流动;

相似文献

外文文献
中文文献
专利

1. Resource-efficient utilization of CPU/GPU-based heterogeneous supercomputers for Bayesian phylogenetic inference [J] . Jun Chai, Huayou Su, Mei Wen, Journal of supercomputing . 2013,第1期

机译：贝叶斯系统发生推理的基于CPU / GPU的异构超级计算机的资源有效利用
2. Efficient adaptive load balancing approach for compressive background subtraction algorithm on heterogeneous CPU-GPU platforms [J] . Mabrouk Lhoussein, Huet Sylvain, Houzet Dominique, Journal of Real-Time Image Processing . 2020,第5期

机译：异构CPU-GPU平台压缩背景减法算法的高效自适应负载平衡方法
3. A Dual Heterogeneous Island Genetic Algorithm for Solving Large Size Flexible Flow Shop Scheduling Problems on Hybrid Multicore CPU and GPU Platforms [J] . Luo Jia, El Baz Didier Mathematical Problems in Engineering . 2019,第6期

机译：解决混合多核CPU和GPU平台上的大型柔性流水车间调度问题的双异构岛遗传算法
4. Large-scale distributed sorting for GPU-based heterogeneous supercomputers [C] . Shamoto Hideyuki, Shirahata Koichi, Drozd A., IEEE International Congress on Big Data . 2014

机译：基于GPU的异构超级计算机的大规模分布式排序
5. Efficient Viewshed Computation Algorithms on GPUs and CPUs [D] . Qarah, Faisal F. 2020

机译：GPU和CPU上有效的viewShed计算算法
6. Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines [O] . George Teodoro, Tony Pan, Tahsin Kurc, -1

机译：Hybrid CPU-GPU机器上有效的不规则波前传播算法
7. Hybrid algorithms for efficient Cholesky decomposition and matrix inverse using multicore CPUs with GPU accelerators [O] . Macindoe GI 2013

机译：使用具有GPU加速器的多核CPU进行高效Cholesky分解和矩阵逆的混合算法

A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅