首页> 外文期刊>Procedia Computer Science >Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers
【24h】

Multi-GPU Implementation of a 3D Finite Difference Time Domain Earthquake Code on Heterogeneous Supercomputers

机译:异构超级计算机上3D时域有限差分地震代码的多GPU实现

获取原文
           

摘要

We have developed a highly scalable 3D Finite Difference GPU code for use in earthquake engineering and disaster management through regional petascale earthquake simulations. This MPI-CUDA code is based on a widely-used wave propagation code called AWP-ODC and restructured for high throughput and efficiency on a heterogeneous computing architecture. We present an effective communication reduction technique for leveraging GPUs with minimal PCI-e overhead, and a novel overlapping method to fully hide data communication latency between GPUs. The optimization concept used in this work can be extended to general stencil computing on a structured grid. The benchmarks demonstrated sustained 100 TFlops in single precision for 49 billion mesh points using 952 GPUs on the NCCS Titan Phase 5 system, which is a 77-fold speedup compared to the CPU version of the code. This multi-GPU implementation has been validated and used for a large-scale verification wave propagation simulation of Mw5.4 Chino Hills earthquake using 128 GPUs.
机译:我们已经开发了高度可扩展的3D有限差分GPU代码,可通过区域Petascale地震模拟在地震工程和灾难管理中使用。该MPI-CUDA代码基于被称为AWP-ODC的广泛使用的波传播代码,并在异构计算体系结构上进行了重组,以实现高吞吐量和效率。我们提出了一种有效的通信减少技术,可利用最少的PCI-e开销利用GPU,以及一种新颖的重叠方法来完全隐藏GPU之间的数据通信延迟。这项工作中使用的优化概念可以扩展到结构化网格上的通用模板计算。基准测试表明,在NCCS Titan Phase 5系统上使用952个GPU,单精度可保持100 TFlop的速度,达到490亿个网格点,与CPU版本的代码相比,加速了77倍。这种多GPU实现已得到验证,并用于使用128个GPU的Mw5.4奇诺岗山地震的大规模验证波传播模拟。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号