An Empirically Tuned 2D and 3D FFT Library on CUDA GPU

机译：在CUDA GPU上根据经验调整的2D和3D FFT库

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a Cooley-Tukey algorithm based multidimensional FFT computation framework on GPU is proposed. This framework generalizes the decomposition of multi-dimensional FFT on GPUs using an I/O tensor representation, and therefore provides a systematic description of possible FFT implementations on GPUs. The framework is geared to the efficiency of multi-dimensional FFT on GPU architectures. In particular, no global transposition among dimensions is performed and some previously unnoticed grouping and commutability of multiple dimensions are highlighted in order to reduce the number of computational kernels and minimize the number of global memory accesses. Important architectural factors and constraints of CUDA, such as coalesced access, bank conflicts and register pressure are also considered in this framework. Moreover, we adapt codelets, a straight-line style FFT implementation originally developed in FFTW, into our framework and prove that they are highly efficient on GPUs.rnA 2D and 3D FFT library, currently supporting power-of-two sizes, is implemented on this framework and empirically-tuned results are compared with CUFFT and other recent publications on three NVIDIA GPUs. On a high-end NVIDIA GPU, GeForce GTX280, our 2D implementation is 2.8x faster than CUFFT and 1.6x faster than the best previously published results on average. Our 3D FFT implementation achieves 22.7× speed up over CUFFT on average. Furthermore both implementations show better precision than CUFFT. This library and its framework are potentially extensible to more general FFT problem sizes and other parallel architectures as well.

机译：本文提出了一种基于Cooley-Tukey算法的GPU多维FFT计算框架。此框架使用I / O张量表示概括了GPU上多维FFT的分解，因此提供了对GPU上可能的FFT实现的系统描述。该框架旨在提高GPU架构上多维FFT的效率。特别地，不执行维度之间的全局转换，并且突出显示一些先前未被注意的多维维度的分组和可交换性，以减少计算内核的数量并最小化全局内存访问的数量。在此框架中还考虑了CUDA的重要架构因素和约束，例如合并访问，银行冲突和注册压力。此外，我们将小码（一种最初由FFTW开发的直线型FFT实现）改编到我们的框架中，并证明它们在GPU上高效.rnA 2D和3D FFT库目前支持2的幂次方。该框架和根据经验调整的结果与CUFFT和其他三个NVIDIA GPU上的最新出版物进行了比较。在高端NVIDIA GPU GeForce GTX280上，我们的2D实现平均比CUFFT快2.8倍，比以前发布的最佳结果平均快1.6倍。我们的3D FFT实现平均比CUFFT快22.7倍。此外，两种实现均显示出比CUFFT更好的精度。该库及其框架可能会扩展到更通用的FFT问题大小以及其他并行架构。

著录项

来源
《24th ACM international conference on supercomputing 2010》|2010年|p.305-314|共10页
会议地点 Amsterdam(NL);Amsterdam(NL)
作者
Liang Gu; Xiaoming Li; Jakob Siegel;
展开▼
作者单位

Department of ECE University of Delaware Newark, DE, USA;

rnDepartment of ECE University of Delaware Newark, DE, USA;

rnDepartment of ECE University of Delaware Newark, DE, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
2D FFT; 3D FFT; library generation; empirical tuning; GPU; CUDA;

机译：2D FFT; 3D FFT;库生成；实验调整GPU;卡达;

相似文献

外文文献
中文文献
专利

1. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs [J] . Yan Li, Yun-Quan Zhang, Yi-Qun Liu, 计算机科学技术学报（英文版） . 2013,第001期

机译：MPFFT：用于OpenCL GPU的自动调整FFT库
2. Improved CUDA programs for GPU computing of Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models [J] . Komura Yukihiro, Okabe Yutaka Computer physics communications . 2016,第Null期

机译：改进的CUDA程序，用于Swendsen-Wang多集群自旋翻转算法的GPU计算：2D和3D Ising，Potts和XY模型
3. CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models [J] . Yukihiro Komura, Yutaka Okabe Computer physics communications . 2014,第3期

机译：CUDA程序，用于Swendsen-Wang多集群旋转翻转算法的GPU计算：2D和3D Ising，Potts和XY模型
4. An Empirically Tuned 2D and 3D FFT Library on CUDA GPU [C] . Liang Gu, Xiaoming Li, Jakob Siegel ACM international conference on supercomputing . 2010

机译：CUDA GPU上的经验调整的2D和3D FFT库
5. Efficient GPU Parallelization of the Agent-Based Models Using MASS CUDA Library [D] . Kosiachenko, Elizaveta. 2018

机译：使用质量CUDA文库的基于代理的模型的高效GPU并行化
6. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA [O] . Dariusz Mrozek, Miłosz Brożek, Bożena Małysiak-Mrozek -1

机译：使用GPU和CUDA并行实现3D蛋白质结构相似性搜索
7. CUDA programs for GPU computing of Swendsen-Wang multi-cluster spin flip algorithm: 2D and 3D Ising, Potts, and XY models [O] . Komura, Yukihiro, Okabe, Yutaka 2014

机译：用于swendsen-Wang多簇旋转翻转的GpU计算的CUDa程序算法：2D和3D Ising，potts和XY模型

An Empirically Tuned 2D and 3D FFT Library on CUDA GPU

摘要

著录项

相似文献

相关主题

期刊订阅