【24h】

Porting the MPI Parallelized LES Model PALM to Multi-GPU Systems - An Experience Report

机译:将MPI并行LES模型PALM移植到多GPU系统-经验报告

获取原文

摘要

The computational power of graphics processing units (GPUs) and their availability on high performance computing (HPC) systems is rapidly evolving. However, HPC applications need to be ported to be executable on such hardware. This paper is a report on our experience of porting the MPI + OpenMP parallelized large-eddy simulation model (PALM) to a multi-GPU environment using the directive based high level programming paradigm OpenACC. PALM is a Fortran-based computational fluid dynamics software package, used for the simulation of atmospheric and oceanic boundary layers to answer questions linked to fundamental atmospheric turbulence research, urban climate, wind energy and cloud physics. Development on PALM started in 1997, the project currently entails 140 kLOC and is used on HPC farms of up to 43200 cores. The porting took place during the GPU Hackathon TU Dresden/Forschungszentrum Juelich in Dresden, Germany, in 2016. The main challenges we faced are the legacy code base of PALM and its size. We report the methods used to disentangle performance effects from logical code defects as well as our experiences with state-of-the-art profiling tools. We present detailed performance tests showing an overall performance on one GPU that can easily compete with up to ten CPU cores.
机译:图形处理单元(GPU)的计算能力及其在高性能计算(HPC)系统上的可用性正在迅速发展。但是,需要将HPC应用程序移植为可在此类硬件上执行。本文是关于我们使用基于指令的高级编程范例OpenACC将MPI + OpenMP并行化大涡流仿真模型(PALM)移植到多GPU环境的经验的报告。 PALM是基于Fortran的计算流体动力学软件包,用于模拟大气和海洋边界层,以回答与基本大气湍流研究,城市气候,风能和云物理学相关的问题。 PALM的开发始于1997年,该项目目前需要140 kLOC,并用于多达43200个内核的HPC场。移植发生在2016年德国德累斯顿的GPU Hackathon TU Dresden / Forschungszentrum Juelich期间。我们面临的主要挑战是PALM的传统代码库及其大小。我们报告了用于消除逻辑代码缺陷带来的性能影响的方法,以及我们使用最新的概要分析工具的经验。我们提供详细的性能测试,显示一个GPU上的整体性能,可以轻松与多达十个CPU内核竞争。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号