...
首页> 外文期刊>Journal of supercomputing >Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations
【24h】

Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations

机译:使用OpenACC和CUDA Fortran实施的GPU上的Nekbone性能

获取原文
获取原文并翻译 | 示例

摘要

We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier-Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix-matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather-scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.
机译:我们介绍了Nekbone的混合GPU实现和性能分析,它代表了不可压缩的Navier-Stokes求解器Nek5000的核心内核之一。该实现基于OpenACC和CUDA Fortran,用于对计算密集型矩阵矩阵乘法部分进行本地并行处理,从而在将代码的仿真功能扩展到GPU架构的同时,极大地减少了对现有CPU代码的修改。我们的讨论包括与CUDA Fortran互操作的OpenACC的GPU结果以及与GPUDirect通信的收集分散操作。我们在OLCF Cray XK7 Titan的16个384 GPU上展示了高达552 Tflops的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号