首页> 外文期刊>Journal of Parallel and Distributed Computing >Towards accelerating smoothed particle hydrodynamics simulations for free-surface flows on multi-GPU clusters
【24h】

Towards accelerating smoothed particle hydrodynamics simulations for free-surface flows on multi-GPU clusters

机译:为加速多GPU集群上自由表面流动的平滑粒子流体动力学模拟

获取原文
获取原文并翻译 | 示例

摘要

Starting from the single graphics processing unit (GPU) version of the Smoothed Particle Hydrodynamics (SPH) code DualSPHysics, a multi-GPU SPH program is developed for free-surface flows. The approach is based on a spatial decomposition technique, whereby different portions (sub-domains) of the physical system under study are assigned to different GPUs. Communication between devices is achieved with the use of Message Passing Interface (MPI) application programming interface (API) routines. The use of the sorting algorithm radix sort for inter-GPU particle migration and sub-domain "halo" building (which enables interaction between SPH particles of different sub-domains) is described in detail. With the resulting scheme it is possible, on the one hand, to carry out simulations that could also be performed on a single GPU, but they can now be performed even faster than on one of these devices alone. On the other hand, accelerated simulations can be performed with up to 32 million particles on the current architecture, which is beyond the limitations of a single GPU due to memory constraints. A study of weak and strong scaling behaviour, speedups and efficiency of the resulting program is presented including an investigation to elucidate the computational bottlenecks. Last, possibilities for reduction of the effects of overhead on computational efficiency in future versions of our scheme are discussed.
机译:从平滑粒子流体动力学(SPH)代码DualSPHysics的单图形处理单元(GPU)版本开始,开发了用于自由表面流动的多GPU SPH程序。该方法基于空间分解技术,其中正在研究的物理系统的不同部分(子域)被分配给不同的GPU。设备之间的通信是通过使用消息传递接口(MPI)应用程序编程接口(API)例程来实现的。详细介绍了将排序算法基数用于GPU之间的粒子迁移和子域“光晕”构建(可实现不同子域的SPH粒子之间的交互)的使用。一方面,使用所得的方案可以执行仿真,该仿真也可以在单个GPU上执行,但是现在它们的执行速度甚至比仅在其中一个设备上执行的速度还要快。另一方面,在当前体系结构上最多可以对3200万个粒子执行加速仿真,由于内存限制,这超出了单个GPU的限制。提出了对结果缩放程序的弱和强缩放行为,加速和效率的研究,其中包括阐明计算瓶颈的调查。最后,讨论了在我们方案的未来版本中减少开销对计算效率的影响的可能性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号