首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems
【24h】

Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems

机译:多核和多插槽系统的大带宽效率FFT

获取原文

摘要

Current microprocessor trends show a steady increase in the number of cores and/or threads present on the same CPU die. While this increase improves performance for compute-bound applications, the benefits for memory-bound applications are limited. The discrete Fourier transform (DFT) is an example of such a memory-bound application, where increasing the number of cores does not yield a corresponding increase in performance. In this paper, we present an alternate solution for using the increased number of cores/threads available on a typical multicore system. We propose to repurpose some of the cores/threads as soft Direct Memory Access (DMA) engines so that data is moved on and off chip while computation is performed. Overlapping memory accesses with computation permits us to preload and reshape data so that computation is more efficient. We show that despite using fewer cores/threads for computation, our approach improves performance relative to MKL and FFTW by 1.2x to 3x for large multi-dimensional DFTs of up to 2048^3 on one and two-socket Intel and AMD systems.
机译:当前的微处理器趋势显示出同一CPU模具上存在的核心和/或线程数的稳定增加。虽然此增加提高了对计算绑定应用程序的性能,但是内存绑定应用程序的优势是有限的。离散的傅里叶变换(DFT)是这种内存绑定应用的示例,其中核的数量不产生相应的性能增加。在本文中,我们介绍了使用典型的多核系统上可用的核/线程数量增加的替代解决方案。我们建议将一些核/线程作为软直接存储器访问(DMA)引擎重新运送,以便在计算计算时在芯片上和关闭数据。具有计算的重叠内存访问允许我们预加载和重塑数据,以便计算更有效。我们表明,尽管使用较少的核心/线程用于计算,但我们的方法可以在一个和双插座英特尔和AMD系统上为高达2048 ^ 3的大型多维DFT来提高MKL和FFTW的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号