首页> 外文期刊>Applied Mathematical Modelling >A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers
【24h】

A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers

机译:一种基于GPU的高雷诺数数流动在异构CPU / GPU超级计算机上的GPU算法

获取原文
获取原文并翻译 | 示例

摘要

An optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems using large-eddy simulation is presented. The code was validated for the simulation of wave boundary-layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform-based solver was developed for the solution of the Poisson equation for pressure taking advantage of the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the overhead of performing MP1 communications using GPUs. As a result, the weak scaling of the algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 × 10~5) using a grid of 4 × 10~8 cells was executed, and the performance of the code was analyzed. The simulation was launched using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of the world (Piz Daint). A comparison of the overall computational time showed that the GPU version was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of modern supercomputing capabilities.
机译:提出了一种优化的MPI + OPECACC实现模型,其在使用大涡模拟中的CPU / GPU系统中有效执行。用于对文献中的数值和实验数据进行波边界流的模拟验证了代码。开发了一种基于快速的傅立叶变换的求解器,用于泊松方程,用于利用周期性边界条件的压力。该解算器优化了CPU中的并行执行,并且在计算时间在GPU中的典型迭代预处理缀合物梯度求解器在10次上进行了10倍。在并行性能方面,开发了重叠策略以减少使用GPU执行MP1通信的开销。结果,算法的弱比例高达30%。最后,执行了使用4×10〜8个单元格网格的大规模模拟(RE = 2×10〜5),分析了代码的性能。在世界上最多512个节点(512 GPU + 6144 CPU-CPU-COM-CPU-COSE)上推出了模拟,其中一个世界上十大超级计算机(PIZ DAINT)之一。整体计算时间的比较显示,GPU版本比CPU速度快4.2倍。与最先进的CPU实现相比,该策略(47%)的并行效率与最先进的CPU实现相比,它有可能利用现代超级计算能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号