首页> 外文期刊>Computer physics communications >OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation
【24h】

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

机译:OpenMP,OpenMP / MPI和CUDA / MPI C程序,用于求解与时间有关的偶极Gross-Pitaevskii方程

获取原文
获取原文并翻译 | 示例
       

摘要

We present new versions of the previously published C and CUDA programs for solving the dipolar Gross-Pitaevskii equation in one, two, and three spatial dimensions, which calculate stationary and non stationary solutions by propagation in imaginary or real time. Presented programs are improved and parallelized versions of previous programs, divided into three packages according to the type of parallelization. First package contains improved and threaded version of sequential C programs using OpenMP. Second package additionally parallelizes three-dimensional variants of the OpenMP programs using MPI, allowing them to be run on distributed-memory systems. Finally, previous three-dimensional CUDA-parallelized programs are further parallelized using MPI, similarly as the OpenMP programs. We also present speedup test results obtained using new versions of programs in comparison with the previous sequential C and parallel CUDA programs. The improvements to the sequential version yield a speedup of 1.1-1.9, depending on the program. OpenMP parallelization yields further speedup of 2-12 on a 16-core workstation, while OpenMP/MPI version demonstrates a speedup of 11.5-16.5 on a computer cluster with 32 nodes used. CUDA/MPI version shows a speedup of 9-10 on a computer cluster with 32 nodes.
机译:我们介绍了以前发布的C和CUDA程序的新版本,用于在一个,两个和三个空间维度上求解偶极Gross-Pitaevskii方程,这些方程通过虚或实时传播来计算固定和非固定解。所介绍的程序是对先前程序的改进和并行化版本,根据并行化的类型分为三个软件包。第一个软件包包含使用OpenMP的顺序C程序的改进版本和线程版本。第二个程序包还使用MPI并行化了OpenMP程序的三维变体,从而使它们可以在分布式内存系统上运行。最后,类似于OpenMP程序,使用MPI进一步并行化了先前的三维CUDA并行程序。与以前的顺序C和并行CUDA程序相比,我们还介绍了使用新版本程序获得的加速测试结果。对顺序版本的改进产生了1.1-1.9的加速,具体取决于程序。 OpenMP并行化在16核工作站上的速度进一步提高了2-12,而OpenMP / MPI版本在使用32个节点的计算机集群上显示了11.5-16.5的速度。 CUDA / MPI版本在具有32个节点的计算机群集上显示了9-10的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号