Collaborating CPU and GPU for large-scale high-order CFDsimulations with complex grids on the TianHe-1A supercomputer

Chuanfu Xu; Xiaogang Deng; Lilun Zhang; Jianbin Fang; Guangxue Wang; Yi Jiang; Wei Cao; Yonggang Che; Yongxian Wang; Zhenghua Wang; Wei Liu; Xinghua Cheng

首页> 外文期刊>Journal of Computational Physics >Collaborating CPU and GPU for large-scale high-order CFDsimulations with complex grids on the TianHe-1A supercomputer

【24h】

Collaborating CPU and GPU for large-scale high-order CFDsimulations with complex grids on the TianHe-1A supercomputer

机译：在天河1A超级计算机上将CPU和GPU协作用于复杂网格的大规模高阶CFD仿真

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. Inthis paper, with a tri-level hybrid and heterogeneous programming model using MPI+OpenMP+CUDA, weport and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. Wepresent a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. Toachieve a greater speedup, wecollaborateCPU and GPU for HOSTA instead of using a naive GPU-onlyapproach. Wepresent a novel scheme to balance the loads between the store-poor GPU and the store-richCPU. Taking CPU and GPU load balance into account, weimprove the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, toscale HOSTA on TianHe-1A, wepropose a gather/scatter optimization to minimize PCI-edata transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, wehave successfully simulated an EET high-lift airfoil configuration containing 800M cells and China’s large civil airplane configuration containing 150M cells. Toour best knowledge, those are the largest-scale CPU–GPU collaborative simulations that solve realistic CFD problems with both complex configurations and high-order schemes.

机译：在当前的多核加速HPC系统上编程和优化复杂的现实CFD代码非常具有挑战性，尤其是在协作CPU和加速器以充分利用异构系统的潜力时。本文采用MPI + OpenMP + CUDA的三级混合异构编程模型，在GPU加速的TianHe-1A超级计算机上移植并优化了高阶多块结构CFD软件HOSTA。 HOSTA采用两种自行开发的高阶紧凑定差方案WCNS和HDCS，可以模拟具有复杂几何形状的流。我们提出了一种用于GPU上高效多块计算的双层并行化方案，并针对高阶CFD方案执行了特定的内核优化。当将一个Tesla M2050 GPU与两个Xeon X5670 CPU进行比较时，仅GPU的方法可实现约1.3的加速。为了实现更高的速度，我们为HOSTA协作使用CPU和GPU，而不是仅使用朴素的GPU方法。我们提出了一种新颖的方案来平衡存储不足的GPU和存储丰富的CPU之间的负载。考虑到CPU和GPU的负载平衡，我们将HOSTA的每个TianHe-1A节点的最大仿真问题大小提高了2.3倍，同时，与仅使用GPU的方法相比，协作方法可以将性能提高约45％。此外，为了在TianHe-1A上扩展HOSTA，我们提出了一个收集/分散优化，以最大程度地减少3D网格块的幻影和奇异数据的PCI-edata传输时间，并使用一些高级CUDA和MPI功能尽可能地重叠协作计算和通信。。可伸缩性测试表明，HOSTA在1024个TianHe-1A节点上可以达到60％以上的并行效率。通过我们的方法，我们已经成功地模拟了包含800M单元的EET高升翼型配置和中国拥有150M单元的大型民用飞机配置。据我们所知，这是最大规模的CPU-GPU协同仿真，可以通过复杂的配置和高阶方案来解决实际的CFD问题。

著录项

来源
《Journal of Computational Physics》 |2014年第null期|共23页
作者
Chuanfu Xu; Xiaogang Deng; Lilun Zhang; Jianbin Fang; Guangxue Wang; Yi Jiang; Wei Cao; Yonggang Che; Yongxian Wang; Zhenghua Wang; Wei Liu; Xinghua Cheng;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类数学物理方法;
关键词
GPU parallelization; CFD; CPU–GPU collaboration; High-order finite difference scheme; Multi-block structured grid;

机译：GPU并行化;CFD;CPU-GPU协作;高阶有限差分方案;多块结构化网格;

相似文献

外文文献
中文文献
专利

1. Collaborating CPU and GPU for large-scale high-order CFDsimulations with complex grids on the TianHe-1A supercomputer [J] . Chuanfu Xu, Xiaogang Deng, Lilun Zhang, Journal of Computational Physics . 2014,第Null期

机译：在天河1A超级计算机上将CPU和GPU协作用于复杂网格的大规模高阶CFD仿真
2. CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system [J] . Wei Cao, Chuan-fu Xu, Zheng-hua Wang, Cluster computing . 2014,第2期

机译：大型异构系统上基于多块结构网格的高阶流求解器的CPU / GPU计算
3. Efficient parallel implementation of large scale 3D structured grid CFD applications on the Tianhe-1A supercomputer [J] . Wang Yong-Xian, Zhang Li-Lun, Liu Wei, Computers & Fluids . 2013,第Null期

机译：在天河1A超级计算机上高效并行实现大规模3D结构化网格CFD应用程序
4. Balancing CPU-GPU Collaborative High-Order CFD Simulations on the Tianhe-1A Supercomputer [C] . Xu Chuanfu, Zhang Lilun, Deng Xiaogang, IEEE International Parallel Distributed Processing Symposium . 2014

机译：天河1A超级计算机上的CPU-GPU协作高阶CFD仿真平衡
5. Large-Scale Complex Systems: From Antenna Circuits to Power Grids [D] . Lavaei, Javad 2011

机译：大型复杂系统：从天线电路到电网
6. Virtual Grid Engine: a simulated grid engine environment for large-scale supercomputers [O] . Satoshi Ito, Masaaki Yadome, Tatsuo Nishiki, 2019

机译：虚拟网格引擎：用于大型超级计算机的模拟网格引擎环境
7. An Optimized Large-Scale Hybrid DGEMM Design for CPUs and ATI GPUs [O] . Jiajia Li, Xingjian Li, Guangming Tan, 2015

机译：针对CpU和aTI GpU的优化大规模混合DGEmm设计

Collaborating CPU and GPU for large-scale high-order CFDsimulations with complex grids on the TianHe-1A supercomputer

摘要

著录项

相似文献

相关主题

期刊订阅