首页> 外文期刊>Advances in Engineering Software >Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces
【24h】

Graphics processing unit accelerated phase field dislocation dynamics: Application to bi-metallic interfaces

机译:图形处理单元加速相场错位动力学:在双金属界面中的应用

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

We present the first high-performance computing implementation of the meso-scale phase field dislocation dynamics (PFDD) model on a graphics processing unit (GPU)-based platform. The implementation takes advantage of the portable OpenACC standard directive pragmas along with Nvidia's compute unified device architecture (CUDA) fast Fourier transform (FFT) library called CUFFT to execute the FFT computations within the PFDD formulation on the same GPU platform. The overall implementation is termed ACCPFDD-CUFFT. The package is entirely performance portable due to the use of OPENACC-CUDA inter-operability, in which calls to CUDA functions are replaced with the OPENACC data regions for a host central processing unit (CPU) and device (GPU). A comprehensive benchmark study has been conducted, which compares a number of FFT routines, the Numerical Recipes FFT (FOURN), Fastest Fourier Transform in the West (FFTW), and the CUFFT. The last one exploits the advantages of the GPU hardware for FFT calculations. The novel ACCPFDD-CUFFT implementation is verified using the analytical solutions for the stress field around an infinite edge dislocation and subsequently applied to simulate the interaction and motion of dislocations through a bi-phase copper-nickel (Cu-Ni) interface. It is demonstrated that the ACCPFDD-CUFFT implementation on a single TESLA K80 GPU offers a 27.6X speedup relative to the serial version and a 5X speedup relative to the 22-multicore Intel Xeon CPU E5-2699 v4 @ 2.20 GHz version of the code.
机译:我们提出了基于图形处理单元(GPU)的中尺度相场错位动力学(PFDD)模型的第一个高性能计算实现。该实施利用便携式OpenACC标准指令实用程序以及Nvidia的计算统一设备体系结构(CUDA)快速傅里叶变换(FFT)库(称为CUFFT)的优势,在同一GPU平台上执行PFDD公式内的FFT计算。总体实现称为ACCPFDD-CUFFT。由于使用了OPENACC-CUDA互操作性,因此该软件包具有完全的性能可移植性,其中对CUDA函数的调用已替换为主机中央处理器(CPU)和设备(GPU)的OPENACC数据区域。进行了全面的基准研究,比较了许多FFT例程,数字食谱FFT(FOURN),西方最快的傅立叶变换(FFTW)和CUFFT。最后一个利用GPU硬件的优势进行FFT计算。使用针对无限边缘位错周围的应力场的解析解验证了新颖的ACCPFDD-CUFFT实施方案,随后将其应用于通过双相铜镍(Cu-Ni)界面模拟位错的相互作用和运动。事实证明,单个TESLA K80 GPU上的ACCPFDD-CUFFT实现相对于串行版本提供了27.6倍的加速,相对于22多核Intel Xeon CPU E5-2699 v4 @ 2.20 GHz版本的代码提供了5倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号