首页> 外文期刊>Journal of Computational Physics >Collaborating CPU and GPU for large-scale high-order CFDsimulations with complex grids on the TianHe-1A supercomputer
【24h】

Collaborating CPU and GPU for large-scale high-order CFDsimulations with complex grids on the TianHe-1A supercomputer

机译:在天河1A超级计算机上将CPU和GPU协作用于复杂网格的大规模高阶CFD仿真

获取原文
获取原文并翻译 | 示例
           

摘要

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. Inthis paper, with a tri-level hybrid and heterogeneous programming model using MPI+OpenMP+CUDA, weport and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. Wepresent a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. Toachieve a greater speedup, wecollaborateCPU and GPU for HOSTA instead of using a naive GPU-onlyapproach. Wepresent a novel scheme to balance the loads between the store-poor GPU and the store-richCPU. Taking CPU and GPU load balance into account, weimprove the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, toscale HOSTA on TianHe-1A, wepropose a gather/scatter optimization to minimize PCI-edata transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, wehave successfully simulated an EET high-lift airfoil configuration containing 800M cells and China’s large civil airplane configuration containing 150M cells. Toour best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.
机译:在当前的多核加速HPC系统上编程和优化复杂的现实CFD代码非常具有挑战性,尤其是在协作CPU和加速器以充分利用异构系统的潜力时。本文采用MPI + OpenMP + CUDA的三级混合异构编程模型,在GPU加速的TianHe-1A超级计算机上移植并优化了高阶多块结构CFD软件HOSTA。 HOSTA采用两种自行开发的高阶紧凑定差方案WCNS和HDCS,可以模拟具有复杂几何形状的流。我们提出了一种用于GPU上高效多块计算的双层并行化方案,并针对高阶CFD方案执行了特定的内核优化。当将一个Tesla M2050 GPU与两个Xeon X5670 CPU进行比较时,仅GPU的方法可实现约1.3的加速。为了实现更高的速度,我们为HOSTA协作使用CPU和GPU,而不是仅使用朴素的GPU方法。我们提出了一种新颖的方案来平衡存储不足的GPU和存储丰富的CPU之间的负载。考虑到CPU和GPU的负载平衡,我们将HOSTA的每个TianHe-1A节点的最大仿真问题大小提高了2.3倍,同时,与仅使用GPU的方法相比,协作方法可以将性能提高约45%。此外,为了在TianHe-1A上扩展HOSTA,我们提出了一个收集/分散优化,以最大程度地减少3D网格块的幻影和奇异数据的PCI-edata传输时间,并使用一些高级CUDA和MPI功能尽可能地重叠协作计算和通信。 。可伸缩性测试表明,HOSTA在1024个TianHe-1A节点上可以达到60%以上的并行效率。通过我们的方法,我们已经成功地模拟了包含800M单元的EET高升翼型配置和中国拥有150M单元的大型民用飞机配置。据我们所知,这是最大规模的CPU-GPU协同仿真,可以通过复杂的配置和高阶方案来解决实际的CFD问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号