【24h】

Automatic FFT Performance Tuning on OpenCL GPUs

机译:在OpenCL GPU上自动进行FFT性能调整

获取原文

摘要

Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to programmers. However, the complexity of GPU programming poses a significant challenge for programmers. In this paper, based on the Kronecker product form multi-dimensional FFTs, we propose an automatic performance tuning framework for various OpenCL GPUs. Several key techniques of GPU programming on AMD and NVIDIA GPUs are also identified. Our OpenCL FFT library achieves up to 1.5 to 4 times, 1.5 to 40 times and 1.4 times the performance of clAmdFft 1.0 for 1D, 2D and 3D FFT respectively on an AMD GPU, and the overall performance is within 90% of CUFFT 4.0 on two NVIDIA GPUs.
机译:科学和工程学的许多领域,例如天文学,医学成像,地震学和光谱学,已经通过傅立叶方法进行了革新。快速傅立叶变换(FFT)是一种有效的算法,可以计算离散傅立叶变换(DFT)及其逆运算。新兴的高性能计算架构(例如GPU)类试图通过向程序员公开不同内存的层次结构来实现更高的性能和效率。但是,GPU编程的复杂性给程序员带来了巨大的挑战。在本文中,基于Kronecker产品的多维FFT,我们提出了针对各种OpenCL GPU的自动性能调整框架。还确定了在AMD和NVIDIA GPU上进行GPU编程的几种关键技术。我们的OpenCL FFT库在AMD GPU上针对1D,2D和3D FFT的性能分别达到clAmdFft 1.0的1.5到4倍,1.5到40倍和1.4倍,而在两块CUFFT 4.0上,整体性能不到90% NVIDIA GPU。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号