首页> 外文期刊>Parallel Processing Letters >Using Compiler Directives to Port Large Scientific Applications to GPUs: An Example from Atmospheric Science
【24h】

Using Compiler Directives to Port Large Scientific Applications to GPUs: An Example from Atmospheric Science

机译:使用编译器指令将大型科学应用程序移植到GPU:来自大气科学的示例

获取原文
获取原文并翻译 | 示例
           

摘要

For many scientific applications, Graphics Processing Units (GPUs) can be an interesting alternative to conventional CPUs as they can deliver higher memory bandwidth and computing power. While it is conceivable to re-write the most execution time intensive parts using a low-level API for accelerator programming, it may not be feasible to do it for the entire application. But, having only selected parts of the application running on the GPU requires repetitively transferring data between the GPU and the host CPU, which may lead to a serious performance penalty. In this paper we assess the potential of compiler directives, based on the OpenACC standard, for porting large parts of code and thus achieving a full GPU implementation. As an illustrative and relevant example, we consider the climate and numerical weather prediction code COSMO (Consortium for Small Scale Modeling) and focus on the physical parametrizations, a part of the code which describes all physical processes not accounted for by the fundamental equations of atmospheric motion. We show, by porting three of the dominant parametrization schemes, the radiation, microphysics and turbulence parametrizations, that compiler directives are an efficient tool both in terms of final execution time as well as implementation effort. Compiler directives enable to port large sections of the existing code with minor modifications while still allowing for further optimizations for the most performance critical parts. With the example of the radiation parametrization, which contains the solution of a block tri-diagonal linear system, the required code modifications and key optimizations are discussed in detail. Performance tests for the three physical parametrizations show a speedup of between 3× and 7× for execution time obtained on a GPU and on a multi-core CPU of an equivalent generation.
机译:对于许多科学应用来说,图形处理单元(GPU)可以替代常规CPU,因为它们可以提供更高的内存带宽和计算能力。尽管可以使用用于加速器编程的低级API重写执行时间最密集的部分,但对于整个应用程序这样做可能并不可行。但是,仅让应用程序的选定部分在GPU上运行需要在GPU和主机CPU之间重复传输数据,这可能会导致严重的性能损失。在本文中,我们评估了基于OpenACC标准的编译器指令用于移植大部分代码并实现完整GPU实现的潜力。作为说明性和相关性的示例,我们考虑气候和数值天气预报代码COSMO(小规模建模联盟),并关注物理参数设置,该代码的一部分描述了大气基本方程未考虑的所有物理过程运动。通过移植三种主要的参数化方案(辐射,微物理学和湍流参数化),我们证明了编译器指令在最终执行时间和实现工作方面都是有效的工具。编译器指令允许对现有代码的大部分进行较小的修改,同时仍然允许对性能最关键的部分进行进一步的优化。以辐射参数化为例,其中包含块三对角线性系统的解,详细讨论了所需的代码修改和键优化。对这三个物理参数的性能测试显示,在GPU和同等世代的多核CPU上获得的执行时间提高了3到7倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号