【24h】

A hybrid GPU/CPU FFT library for large FFT problems

机译:混合GPU / CPU FFT库可解决大FFT问题

获取原文

摘要

Graphic Processing Units (GPU) has been proved to be a promising platform to accelerate large size Fast Fourier Transform (FFT) computation. However, GPU performance is severely restricted by the limited memory size and the low bandwidth of data transfer through PCI channel. Additionally, current GPU based FFT implementation only uses GPU to compute, but employs CPU as a mere memory-transfer controller. The computing power of CPUs is wasted. This paper proposes a hybrid parallel framework to use both multi-core CPU and GPU in heterogeneous systems to compute large-scale 2D and 3D FFTs that exceed GPU memory. This work introduces a flexible partitioning scheme that enables concurrent execution of CPU and GPU and integrates several FFT decomposition paradigms to tailor computation and communication. Moreover, our library exposes and exploits previously overlooked parallelism in FFT. Optimal load balancing is automatically achieved from effective performance modeling and empirical tuning process. On average, our large FFT library on GeForce GTX480, Tesla C2070, C2075 is 121% and 145% faster than 4-thread SSE-enabled FFTW and Intel MKL, with max speedups 4.61 and 2.81, respectively.
机译:图形处理单元(GPU)已被证明是加速大型快速傅立叶变换(FFT)计算的有前途的平台。但是,GPU的性能受到有限的内存大小和通过PCI通道传输的低带宽的严格限制。此外,当前基于GPU的FFT实现仅使用GPU进行计算,但仅将CPU用作内存传输控制器。 CPU的计算能力被浪费了。本文提出了一种混合并行框架,可在异构系统中同时使用多核CPU和GPU来计算超过GPU内存的大规模2D和3D FFT。这项工作引入了一种灵活的分区方案,该方案支持并发执行CPU和GPU,并集成了多个FFT分解范例以量身定制计算和通信。此外,我们的库公开并利用了先前忽略的FFT并行性。通过有效的性能建模和经验调整过程,可以自动实现最佳负载平衡。平均而言,我们在GeForce GTX480,Tesla C2070,C2075上的大型FFT库比启用4线程SSE的FFTW和Intel MKL快121%和145%,最大加速分别为4.61和2.81。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号