Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems

机译：多核和多路插座系统上的高带宽高效FFT

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current microprocessor trends show a steady increase in the number of cores and/or threads present on the same CPU die. While this increase improves performance for compute-bound applications, the benefits for memory-bound applications are limited. The discrete Fourier transform (DFT) is an example of such a memory-bound application, where increasing the number of cores does not yield a corresponding increase in performance. In this paper, we present an alternate solution for using the increased number of cores/threads available on a typical multicore system. We propose to repurpose some of the cores/threads as soft Direct Memory Access (DMA) engines so that data is moved on and off chip while computation is performed. Overlapping memory accesses with computation permits us to preload and reshape data so that computation is more efficient. We show that despite using fewer cores/threads for computation, our approach improves performance relative to MKL and FFTW by 1.2x to 3x for large multi-dimensional DFTs of up to 2048^3 on one and two-socket Intel and AMD systems.

机译：当前的微处理器趋势表明，在同一CPU芯片上存在的内核和/或线程数量稳步增长。尽管这种增加提高了计算绑定应用程序的性能，但是内存绑定应用程序的好处是有限的。离散傅里叶变换（DFT）是此类内存绑定应用程序的示例，其中增加内核数不会带来相应的性能提升。在本文中，我们提出了一种替代解决方案，用于在典型的多核系统上使用数量增加的内核/线程。我们建议将某些内核/线程重新用作软直接内存访问（DMA）引擎，以便在执行计算时将数据移入和移出芯片。将内存访问与计算重叠可以使我们预加载和重塑数据，从而使计算效率更高。我们证明，尽管在一个和两个插槽的Intel和AMD系统上，对于高达2048 ^ 3的大型多维DFT，我们的方法相对于MKL和FFTW而言，相对于MKL和FFTW而言，其性能提高了1.2倍至3倍。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2018年|379-388|共10页
会议地点
作者
Doru Thom Popovici; Tze Meng Low; Franz Franchetti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Discrete Fourier transforms; Instruction sets; Multicore processing; Three-dimensional displays; Matrix decomposition; Memory management;

机译：离散傅里叶变换;指令集;多核处理;三维显示;矩阵分解;内存管理;

相似文献

外文文献
中文文献
专利

1. Towards large-scale multi-socket, multicore parallel simulations: Performance of an MPI-only semiconductor device simulator [J] . Lin P.T., Shadid J.N. Journal of Computational Physics . 2010,第19期

机译：迈向大规模多插槽，多核并行仿真：仅MPI的半导体器件仿真器的性能
2. A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores [J] . Ardavan Pedram, John D. McCalpin, Andreas Gerstlauer Journal of signal processing systems for signal, image, and video technology . 2014,第1a2期

机译：基于混合线性代数/ FFT核的高效多核浮点FFT架构
3. Performance Modeling of Parallel Loops on Multi-Socket Platforms Using Queueing Systems [J] . IEEE Transactions on Parallel and Distributed Systems . 2020,第2期

机译：使用排队系统的多套接字平台上并行循环的性能建模
4. Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems [C] . Doru Thom Popovici, Tze Meng Low, Franz Franchetti IEEE International Parallel and Distributed Processing Symposium . 2018

机译：多核和多插槽系统的大带宽效率FFT
5. Bandwidth-efficient communication systems based on finite-length low density parity check codes. [D] . Vu, Huy G. 2006

机译：基于有限长度低密度奇偶校验码的高效带宽通信系统。
6. Multicore Assemblies from Three-Component Linear Homo-Copolymer Systems: A Coarse-Grained Modeling Study [O] . Sousa Javan Nikkhah, Elsi Turunen, Anneli Lepo, 2021

机译：三组分线性同源共聚物系统的多核组件：粗粒造粒模型研究
7. Scheduling Task Parallelism on Multi-Socket Multicore Systems [O] . Stephen L. Olivier, Allan K. Porterfield, Jan F. Prins, 2012

机译：多套接字多核系统的调度任务并行性

Large Bandwidth-Efficient FFTs on Multicore and Multi-socket Systems

摘要

著录项

相似文献

相关主题

期刊订阅