...
首页> 外文期刊>Computer physics communications >Petascale turbulence simulation using a highly parallel fast multipole method on GPUs
【24h】

Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

机译:在GPU上使用高度并行的快速多极方法进行Petascale湍流仿真

获取原文
获取原文并翻译 | 示例
           

摘要

This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (fmm) as numerical engine, and match the current record in mesh size for this application, a cube of 4096~3 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the fft algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the tsubame-2.0 system). The fft-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-All communication pattern of the fft algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.
机译:本文报道了均质各向同性流体湍流的大规模直接数值模拟,使用单精度在gpu硬件上实现了1.08 petaflop / s的持续性能。仿真使用涡旋粒子方法求解Navier-Stokes方程,并以高度并行的快速多极方法(fmm)作为数值引擎,并为此应用匹配网格大小的当前记录,解决了4096〜3个计算点的立方体用光谱法。在该领域中使用的标准数值方法是伪谱方法,它依赖于fft算法作为数值引擎。本文介绍的基于粒子的模拟使用可信赖的代码定量地匹配使用伪光谱方法获得的动能谱。就并行性能而言,微弱的缩放结果表明,基于fmm的涡旋方法在4096个进程(tmpame-2.0系统的每个mpi进程一个gpu,每个节点3 gpus)上实现了74%的并行效率。由于fft算法的全部通信模式,基于fft的频谱方法在相同数量的mpi进程(仅使用cpu内核)上仅能实现14%的并行效率。在这些条件下,涡旋法的一个时间步长的计算时间为108 s,光谱法的计算时间为154 s。用690亿个粒子进行计算,这项工作比迄今为止最大的涡旋方法计算高出一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号