首页> 外文期刊>Computers & Fluids >Large-scale parallelization based on CPU and GPU cluster for cosmological fluid simulations
【24h】

Large-scale parallelization based on CPU and GPU cluster for cosmological fluid simulations

机译:基于CPU和GPU集群的大规模并行化,用于宇宙流体模拟

获取原文
获取原文并翻译 | 示例
           

摘要

We present our parallel implementation for large-scale cosmological simulations of 3D supersonic fluids based on CPU and GPU clusters. Our developments are based on a CPU code named WIGEON. It is shown that, compared to the original sequential Fortran code, a speedup of 19-31 (depending on the specific GPU card) can be achieved on single GPU. Furthermore, our results show that the pure MPI parallelization scales very well up to 10 thousand CPU cores. In addition, a hybrid CPU/GPU parallelization scheme is introduced and a detailed analysis of the speedup and the scaling on the different number of CPU/GPU units are presented (up to 256 GPU cards due to computing resource limitation). Our high scalability and speedup rely on the domain decomposition approach, optimization of the algorithm and a series of techniques to optimize the CUDA implementation, especially in the memory access pattern on CPU. We believe this hybrid MPI + CUDA code can be an excellent candidate for 10 Peta-scale computing and beyond. (C) 2014 Elsevier Ltd. All rights reserved.
机译:我们为基于CPU和GPU群集的3D超音速流体的大规模宇宙学仿真提供了并行实现。我们的开发基于名为WIGEON的CPU代码。结果表明,与原始顺序Fortran代码相比,单个GPU可以实现19-31的加速(取决于特定的GPU卡)。此外,我们的结果表明,纯MPI并行化可很好地扩展到1万个CPU内核。此外,还引入了一种混合CPU / GPU并行化方案,并给出了对不同数量的CPU / GPU单元的加速和扩展的详细分析(由于计算资源的限制,最多256个GPU卡)。我们的高可扩展性和加速度依赖于域分解方法,算法优化和一系列技术来优化CUDA实现,尤其是在CPU上的内存访问模式中。我们相信,这种MPI + CUDA混合代码可以成为10 Peta级及更高级别计算的理想选择。 (C)2014 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号