Multilevel Parallelism for CFD Codes on Heterogeneous Platforms

机译：异构平台上CFD代码的多级并行

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

High-level parallel programming approaches have recently become popular in complex fluid dynamics research since they are cross-platform and easy to implement. OpenACC is a high-level directive-based parallel library for offloading program execution onto a graphics processing unit (GPU). Program development for both CPU and GPU platforms can be effectively unified using the capability of portable programming with OpenACC. The directive-based model stands in contrast with the detailed implementation model of CUDA and OpenCL that suffers from poor portability and maintainability with changes in the accelerator hardware. In this work, we put effort toward drawing some outlines on efficiently annotating the base serial CPU version of the CFD code to achieve acceptable multithreaded computational performance on the GPU. An Artificial Compressibility Method (ACM) is used for studying steady-state incompressible 2D and 3D heat and fluid flows. We study loop scheduling with careful attention given to the limited on-chip memory available. The possibility of asynchronous calculations in the CFD algorithm is investigated to increase the computing performance. The PGI Fortran 15.7 compiler is used for the following study. Memory management and loop scheduling considerations result in approximately 20% speedup in the GPU computing performance from the base OpenACC implementation. A performance analysis of the thermal flow solver demonstrates that a single NVIDIA Tesla C2075 GPU card exhibits comparable speed to 8 cores on a dual-socket Xeon E5-2687W processor workstation. A multi-GPU performance analysis is performed using NVIDIA Tesla M2050 GPUs on the Virginia Tech HokieSpeed supercomputer. A weak scalability analysis demonstrates up to 92% efficiency with 32 GPUs. With 16 GPUs the thermal flow solver works up to 190 times faster than a single core of a Xeon E5645 processor, 10 times faster than a single GPU, and 12 times faster than 16 CPUs distributed on separate sockets using an MPI multi-CPU implementation of the code.

机译：高级并行编程方法由于跨平台且易于实现，因此在复杂的流体动力学研究中最近变得很流行。 OpenACC是基于指令的高级并行库，用于将程序执行卸载到图形处理单元（GPU）上。使用带有OpenACC的可移植编程功能，可以有效地统一CPU和GPU平台的程序开发。基于指令的模型与CUDA和OpenCL的详细实现模型形成鲜明对比，后者由于加速器硬件的更改而具有较差的可移植性和可维护性。在这项工作中，我们将努力画出一些轮廓，以有效地注释CFD代码的基本串行CPU版本，以在GPU上实现可接受的多线程计算性能。人工可压缩性方法（ACM）用于研究稳态不可压缩2D和3D热量和流体流动。我们研究循环调度时要特别注意可用的有限片上存储器。为了提高计算性能，研究了CFD算法中异步计算的可能性。 PGI Fortran 15.7编译器用于以下研究。内存管理和循环调度注意事项使基本OpenACC实施的GPU计算性能提高了大约20％。对热流求解器的性能分析表明，单个NVIDIA Tesla C2075 GPU卡在双插槽Xeon E5-2687W处理器工作站上具有与8个内核相当的速度。使用Virginia Tech HokieSpeed超级计算机上的NVIDIA Tesla M2050 GPU执行多GPU性能分析。弱的可伸缩性分析表明，使用32个GPU时，效率高达92％。使用16个GPU，热流求解器的工作速度比至强E5645处理器的单个内核快190倍，比单个GPU快10倍，比使用MPI多CPU实现的分布在单独插槽上的16个CPU快12倍。代码。

著录项

来源
《AIAA aviation forum;AIAA fluid dynamics conference》|2016年|889-903|共15页
会议地点
作者
Behzad Baghapour; Andrew McCall; Christopher J. Roy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Assessing and discovering parallelism in C++ code for heterogeneous platforms [J] . del Rio Astorga David, Sotomayor Rafael, Miguel Sanchez Luis, Journal of supercomputing . 2018,第11期

机译：评估和发现异构平台的C ++代码中的并行性
2. Time-energy analysis of multilevel parallelism in heterogeneous clusters: the case of EEG classification in BCI tasks [J] . Jose Escobar Juan, Ortega Julio, Diaz Antonio F., Journal of supercomputing . 2019,第7期

机译：异构集群中多级并行的时间能量分析：以BCI任务中的EEG分类为例
3. Time-energy analysis of multilevel parallelism in heterogeneous clusters: the case of EEG classification in BCI tasks [J] . Jose Escobar Juan, Ortega Julio, Diaz Antonio F., Journal of supercomputing . 2019,第7期

机译：异构集群多级并行性的时间能量分析：BCI任务中EEG分类的情况
4. Multilevel Parallelism for CFD Codes on Heterogeneous Platforms [C] . Behzad Baghapour, Andrew McCall, Christopher J. Roy AIAA aviation forum . 2016

机译：异构平台上CFD代码的多级平行性
5. Multilevel coding with LDPC component codes for power and bandwidth efficiency. [D] . Limpaphayom, Piraporn. 2003

机译：LDPC组件代码的多级编码可提高功率和带宽效率。
6. A 3D CFD model of the interstitial fluid pressure and drug distribution in heterogeneous tumor nodules during intraperitoneal chemotherapy [O] . Margo Steuperaert, Charlotte Debbaut, Charlotte Carlier, 2019

机译：腹膜内化疗过程中异质结节间质液压力和药物分布的3D CFD模型
7. Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution [O] . Walden Ko, Mark Yankelevsky, Dimitrios S. Nikolopoulos, 2002

机译：通过动态自适应执行有效的跨平台，多级并行

Multilevel Parallelism for CFD Codes on Heterogeneous Platforms

摘要

著录项

相似文献

相关主题

期刊订阅