首页> 外文期刊>Concurrency and computation: practice and experience >An analysis of the feasibility and benefits of GPU/multicore acceleration of the Weather Research and Forecasting model
【24h】

An analysis of the feasibility and benefits of GPU/multicore acceleration of the Weather Research and Forecasting model

机译:天气研究和预报模型的GPU /多核加速的可行性和收益分析

获取原文
获取原文并翻译 | 示例

摘要

There is a growing need for ever more accurate climate and weather simulations to be delivered in shorter timescales, in particular, to guard against severe weather events such as hurricanes and heavy rainfall. Due to climate change, the severity and frequency of such events – and thus the economic impact – are set to rise dramatically. Hardware acceleration using graphics processing units (GPUs) or Field-Programmable Gate Arrays (FPGAs) could potentially result in much reduced run times or higher accuracy simulations. In this paper, we present the results of a study of the Weather Research and Forecasting (WRF) model undertaken in order to assess if GPU and multicore acceleration of this type of numerical weather prediction (NWP) code is both feasible and worthwhile. The focus of this paper is on acceleration of code running on a single compute node through offloading of parts of the code to an accelerator such as a GPU. The governing equations set of the WRF model is based on the compressible, non-hydrostatic atmospheric motion with multi-physics processes. We put this work into context by discussing its more general applicability to multi-physics fluid dynamics codes: in many fluid dynamics codes, the numerical schemes of the advection terms are based on finite differences between neighboring cells, similar to the WRF code. For fluid systems including multi-physics processes, there are many calls to these advection routines. This class of numerical codes will benefit from hardware acceleration. We studied the performance of the original code of the WRF model and proposed a simple model for comparing multicore CPU and GPU performance. Based on the results of extensive profiling of representative WRF runs, we focused on the acceleration of the scalar advection module. We discuss the implementation of this module as a data-parallel kernel in both OpenCL and OpenMP. We show that our data-parallel kernel version of the scalar advection module runs up to seven times faster on the GPU compared with the original code on the CPU. However, as the data transfer cost between GPU and CPU is very high (as shown by our analysis), there is only a small speed-up (two times) for the fully integrated code. We show that it would be possible to offset the data transfer cost through GPU acceleration of a larger portion of the dynamics code. In order to carry out this research, we also developed an extensible software system for integrating OpenCL code into large Fortran code bases such as WRF. This is one of the main contributions of our work. We discuss the system to show how it allows the replacement of the sections of the original codebase with their OpenCL counterparts with minimal changes – literally only a few lines – to the original code. Our final assessment is that, even with the current system architectures, accelerating WRF – and hence also other, similar types of multi-physics fluid dynamics codes – with a factor of up to five times is definitely an achievable goal. Accelerating multi-physics fluid dynamics codes including NWP codes is vital for its application to weather forecasting, environmental pollution warning, and emergency response to the dispersion of hazardous materials. Implementing hardware acceleration capability for fluid dynamics and NWP codes is a prerequisite for up-to-date and future computer architectures. Copyright © 2015 John Wiley & Sons, Ltd.
机译:越来越需要在更短的时间范围内提供更准确的气候和天气模拟,尤其是要防范飓风和大雨等严峻的天气事件。由于气候变化,此类事件的严重性和频度以及由此带来的经济影响将急剧上升。使用图形处理单元(GPU)或现场可编程门阵列(FPGA)的硬件加速可能会导致运行时间大大减少或仿真精度更高。在本文中,我们介绍了进行的天气研究和预报(WRF)模型的研究结果,目的是评估这种数字天气预报(NWP)代码的GPU和多核加速是否既可行又值得。本文的重点是通过将部分代码卸载到加速器(例如GPU)上来加速在单个计算节点上运行的代码。 WRF模型的控制方程组基于具有多个物理过程的可压缩非静压大气运动。我们通过讨论其对多物理场流体动力学代码的更普遍适用性来将这项工作放到上下文中:在许多流体动力学代码中,对流项的数值方案都是基于邻近单元之间的有限差异,类似于WRF代码。对于包括多物理场过程在内的流体系统,有许多对流平流程序的调用。此类数字代码将从硬件加速中受益。我们研究了WRF模型原始代码的性能,并提出了一个用于比较多核CPU和GPU性能的简单模型。基于代表性WRF运行的广泛分析结果,我们集中于标量对流模块的加速。我们将在OpenCL和OpenMP中讨论该模块作为数据并行内核的实现。我们证明,标量对流模块的数据并行内核版本在GPU上的运行速度是CPU上原始代码的七倍。但是,由于GPU和CPU之间的数据传输成本非常高(如我们的分析所示),因此完全集成的代码只有很小的提速(两倍)。我们证明,可以通过GPU加速大部分动态代码来抵消数据传输成本。为了进行这项研究,我们还开发了一个可扩展的软件系统,用于将OpenCL代码集成到大型Fortran代码库中,例如WRF。这是我们工作的主要贡献之一。我们讨论该系统,以显示它如何允许在对原始代码进行最少的更改(实际上只有几行)的情况下,用OpenCL对应项替换原始代码库的各个部分。我们的最终评估结果是,即使采用当前的系统体系结构,以高达五倍的倍数加速WRF以及其他类似类型的多物理场流体动力学代码,也绝对是可以实现的目标。加速包括NWP代码在内的多物理场流体动力学代码对于将其应用于天气预报,环境污染预警以及对有害物质扩散的应急响应至关重要。实现流体动力学和NWP代码的硬件加速功能是最新和未来计算机体系结构的前提。版权所有©2015 John Wiley&Sons,Ltd.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号