A hybrid GPU/CPU FFT library for large FFT problems

机译：混合GPU / CPU FFT库可解决大FFT问题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphic Processing Units (GPU) has been proved to be a promising platform to accelerate large size Fast Fourier Transform (FFT) computation. However, GPU performance is severely restricted by the limited memory size and the low bandwidth of data transfer through PCI channel. Additionally, current GPU based FFT implementation only uses GPU to compute, but employs CPU as a mere memory-transfer controller. The computing power of CPUs is wasted. This paper proposes a hybrid parallel framework to use both multi-core CPU and GPU in heterogeneous systems to compute large-scale 2D and 3D FFTs that exceed GPU memory. This work introduces a flexible partitioning scheme that enables concurrent execution of CPU and GPU and integrates several FFT decomposition paradigms to tailor computation and communication. Moreover, our library exposes and exploits previously overlooked parallelism in FFT. Optimal load balancing is automatically achieved from effective performance modeling and empirical tuning process. On average, our large FFT library on GeForce GTX480, Tesla C2070, C2075 is 121% and 145% faster than 4-thread SSE-enabled FFTW and Intel MKL, with max speedups 4.61 and 2.81, respectively.

机译：图形处理单元（GPU）已被证明是加速大型快速傅立叶变换（FFT）计算的有前途的平台。但是，GPU的性能受到有限的内存大小和通过PCI通道传输的低带宽的严格限制。此外，当前基于GPU的FFT实现仅使用GPU进行计算，但仅将CPU用作内存传输控制器。 CPU的计算能力被浪费了。本文提出了一种混合并行框架，可在异构系统中同时使用多核CPU和GPU来计算超过GPU内存的大规模2D和3D FFT。这项工作引入了一种灵活的分区方案，该方案支持并发执行CPU和GPU，并集成了多个FFT分解范例以量身定制计算和通信。此外，我们的库公开并利用了先前忽略的FFT并行性。通过有效的性能建模和经验调整过程，可以自动实现最佳负载平衡。平均而言，我们在GeForce GTX480，Tesla C2070，C2075上的大型FFT库比启用4线程SSE的FFTW和Intel MKL快121％和145％，最大加速分别为4.61和2.81。

著录项

来源
《IEEE International Performance Computing and Communications Conference》|2013年|1-10|共10页
会议地点
作者
Chen Shuo; Li Xiaoming;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Introducing ToPe-FFT: An OpenCL-based FFT library targeting GPUs [J] . Bilal Jan, Fiaz Gul Khan, Bartolomeo Montrucchio, Concurrency, practice and experience . 2017,第21期

机译：ToPe-FFT简介：针对GPU的基于OpenCL的FFT库
2. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs [J] . Yan Li, Yun-Quan Zhang, Yi-Qun Liu, 计算机科学技术学报（英文版） . 2013,第001期

机译：MPFFT：用于OpenCL GPU的自动调整FFT库
3. CudaFilters: A SignalPlant library for GPU-accelerated FFT and FIR filtering [J] . Nejedly Petr, Plesinger Filip, Halamek Josef, Software . 2018,第1期

机译：CudaFilters：SignalPlant库，用于GPU加速FFT和FIR过滤
4. A hybrid GPU/CPU FFT library for large FFT problems [C] . Chen Shuo, Li Xiaoming IEEE International Performance Computing and Communications Conference . 2013

机译：用于大型FFT问题的混合GPU / CPU FFT库
5. An Approach for Large-Scale Three-Dimensional FFT-Based Approximate Convolutions on GPUs [D] . Kulkarni, Anuva Abhijit. 2020

机译：GPU大规模三维FFT近似卷积的方法
6. DiSCaMB: a software library for aspherical atom model X-ray scattering factor calculations with CPUs and GPUs [O] . Michał L. Chodkiewicz, Szymon Migacz, Witold Rudnicki, -1

机译：DiSCaMB：用于使用CPU和GPU计算非球面原子模型X射线散射因子的软件库
7. Fast Simulation of Laser Heating Processes on Thin Metal Plates with FFT Using CPU/GPU Hardware [O] . Daniel Mejia-Parra, Ander Arbelaiz, Oscar Ruiz-Salguero, 2020

机译：CPU / GPU硬件用FFT薄金属板激光加热过程的快速模拟

A hybrid GPU/CPU FFT library for large FFT problems

摘要

著录项

相似文献

相关主题

期刊订阅