首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems
【24h】

Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems

机译:多核和多路插座系统上的高带宽高效FFT

获取原文

摘要

Current microprocessor trends show a steady increase in the number of cores and/or threads present on the same CPU die. While this increase improves performance for compute-bound applications, the benefits for memory-bound applications are limited. The discrete Fourier transform (DFT) is an example of such a memory-bound application, where increasing the number of cores does not yield a corresponding increase in performance. In this paper, we present an alternate solution for using the increased number of cores/threads available on a typical multicore system. We propose to repurpose some of the cores/threads as soft Direct Memory Access (DMA) engines so that data is moved on and off chip while computation is performed. Overlapping memory accesses with computation permits us to preload and reshape data so that computation is more efficient. We show that despite using fewer cores/threads for computation, our approach improves performance relative to MKL and FFTW by 1.2x to 3x for large multi-dimensional DFTs of up to 2048^3 on one and two-socket Intel and AMD systems.
机译:当前的微处理器趋势表明,在同一CPU芯片上存在的内核和/或线程数量稳步增长。尽管这种增加提高了计算绑定应用程序的性能,但是内存绑定应用程序的好处是有限的。离散傅里叶变换(DFT)是此类内存绑定应用程序的示例,其中增加内核数不会带来相应的性能提升。在本文中,我们提出了一种替代解决方案,用于在典型的多核系统上使用数量增加的内核/线程。我们建议将某些内核/线程重新用作软直接内存访问(DMA)引擎,以便在执行计算时将数据移入和移出芯片。将内存访问与计算重叠可以使我们预加载和重塑数据,从而使计算效率更高。我们证明,尽管在一个和两个插槽的Intel和AMD系统上,对于高达2048 ^ 3的大型多维DFT,我们的方法相对于MKL和FFTW而言,相对于MKL和FFTW而言,其性能提高了1.2倍至3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号