首页> 外文会议>Satellite data compression, communications, and processing XI >Parallel Implementation of WRF Double Moment 5-Class Cloud Microphysics Scheme on Multiple GPUs
【24h】

Parallel Implementation of WRF Double Moment 5-Class Cloud Microphysics Scheme on Multiple GPUs

机译:在多个GPU上并行实现WRF Double Moment 5级云微物理方案

获取原文
获取原文并翻译 | 示例

摘要

The Weather Research and Forecast (WRF) Double Moment 5-class (WDM5) mixed ice microphysics scheme predicts the mixing ratio of hydrometeors and their number concentrations for warm rain species including clouds and rain. WDM5 can be computed in parallel in the horizontal domain using multi-core GPUs. In order to obtain a better GPU performance, we manually rewrite the original WDM5 Fortran module into a highly parallel CUDA C program. We explore the usage of coalesced memory access and asynchronous data transfer. Our GPU-based WDM5 module is scalable to run on multiple GPUs. By employing one NVIDIA Tesla K40 GPU, our GPU optimization effort on this scheme achieves a speedup of 252x with respect to its CPU counterpart Fortran code running on one CPU core of Intel Xeon E5-2603, whereas the speedup for one CPU socket (4 cores) with respect to one CPU core is only 4.2x. We can even boost the speedup of this scheme to 468x with respect to one CPU core when two NVIDIA Tesla K40 GPUs are applied.
机译:气象研究与预报(WRF)双矩5级(WDM5)混合冰的微物理方案可预测水云团的混合比率及其在包括云和雨在内的暖雨树种中的数量浓度。可以使用多核GPU在水平域中并行计算WDM5。为了获得更好的GPU性能,我们手动将原始WDM5 Fortran模块重写为高度并行的CUDA C程序。我们探索合并内存访问和异步数据传输的用法。我们基于GPU的WDM5模块可扩展以在多个GPU上运行。通过采用一个NVIDIA Tesla K40 GPU,相对于在Intel Xeon E5-2603的一个CPU内核上运行的CPU对应的Fortran代码,我们在此方案上进行的GPU优化工作实现了252倍的加速,而一个CPU插槽(4个内核)的加速)相对于一个CPU内核仅为4.2倍。当使用两个NVIDIA Tesla K40 GPU时,相对于一个CPU内核,我们甚至可以将该方案的速度提高到468倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号