首页> 外文会议>IEEE International Parallel Distributed Processing Symposium >High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform
【24h】

High Performance FFT Based Poisson Solver on a CPU-GPU Heterogeneous Platform

机译:在CPU-GPU异构平台上基于高性能FFT的Poisson解算器

获取原文

摘要

We develop an optimized FFT based Poisson solver on a CPU-GPU heterogeneous platform for the case when the input is too large to fit on the GPU global memory. The solver involves memory bound computations such as 3D FFT in which the large 3D data may have to be transferred over the PCIe bus several times during the computation. We develop a new strategy to decompose and allocate the computation between the GPU and the CPU such that the 3D data is transferred only once to the device memory, and the executions of the GPU kernels are almost completely overlapped with the PCI data transfer. We were able to achieve significantly better performance than what has been reported in previous related work, including over 50 GFLOPS for the three periodic boundary conditions, and over 40 GFLOPS for the two periodic, one Neumann boundary conditions. The PCIe bus bandwidth achieved is over 5GB/s, which is close to the best possible on our platform. For all the cases tested, the single 3D PCIe transfer time, which constitutes a lower bound on what is possible on our platform, takes almost 70% of the total execution time of the Poisson solver.
机译:当输入太大而无法容纳在GPU全局内存上时,我们在CPU-GPU异构平台上开发了基于FFT的优化的Poisson求解器。求解器涉及诸如3D FFT之类的内存限制计算,其中在计算过程中可能必须通过PCIe总线多次传输大型3D数据。我们开发了一种新的策略来分解和分配GPU和CPU之间的计算,从而将3D数据仅传输一次到设备内存,并且GPU内核的执行几乎与PCI数据传输完全重叠。与以前的相关工作相比,我们能够实现显着更好的性能,包括三个周期性边界条件超过50 GFLOPS,两个周期性一个Neumann边界条件超过40 GFLOPS。实现的PCIe总线带宽超过5GB / s,接近我们平台上的最佳性能。对于所有经过测试的情况,单个3D PCIe传输时间几乎构成了Poisson求解器总执行时间的70%,这是我们平台上可能的下限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号