首页> 外文期刊>International Journal of High Performance Computing Applications >Large-scale fast Fourier transform on a heterogeneous multi-core system
【24h】

Large-scale fast Fourier transform on a heterogeneous multi-core system

机译:异构多核系统上的大规模快速傅里叶变换

获取原文
获取原文并翻译 | 示例
           

摘要

As interest in hybrid computing systems increases, people are eager to find new ways to exploit the unique and efficient computational power of the heterogeneous multi-core systems. Although there has been much interest in implementing high-performance fast Fourier transform (FFT) libraries on this kind of system, most existing libraries focus on small-scale FFTs whose data can fit in the local storage of a single accelerator. Real-world FFT applications often require much larger scale FFTs, but it is extremely challenging for heterogeneous multi-core system with distributed architectures to make efficient large FFT implementations. In this paper, we introduce the first known FFT library for the heterogeneous multi-core system with distributed architecture that can solve one-dimensional FFTs larger than what fits in a single accelerator. Our implementation achieves 67% performance improvement of FFTW 3.2.2 (Fastest Fourier Transform in the West) and sustains over 36 single precision FFT GFLOPs 'end-to-end' Achieving such high performance requires novel schemes for large-scale FFT factorization, data permutation and all-to-all exchanges, and buffer designs to maximize use of the local storage while minimizing communication overhead. One important finding in this paper is that large-scale FFT on this kind of architecture behaves as data transfer bound which is quite different from other architectures. A significant contribution of this paper is that for each major component of our algorithm, we explore many possible design options and present quantitative performance comparisons. This provides value beyond specific architecture, as it illustrates the fundamental features associated with different communication paradigms and mechanisms for the heterogeneous multi-core system. Today's computer systems are increasingly being designed to include general purpose accelerators. The techniques in this paper can also be applied to these architectures especially when they have limited local storage or steep cache hierarchies. We also provide insights on applying techniques in this paper to similar architectures.
机译:随着对混合计算系统兴趣的增加,人们渴望找到新的方法来利用异构多核系统的独特而有效的计算能力。尽管在这种系统上实现高性能的快速傅立叶变换(FFT)库引起了人们极大的兴趣,但大多数现有的库都专注于小规模FFT,其数据可以放入单个加速器的本地存储中。现实世界中的FFT应用程序通常需要大得多的FFT,但是对于具有分布式体系结构的异构多核系统而言,要实现高效的大型FFT实现则极具挑战。在本文中,我们为具有分布式架构的异构多核系统引入了第一个已知的FFT库,该库可以解决比单个加速器更大的一维FFT。我们的实现使FFTW 3.2.2(西方最快的傅立叶变换)的性能提高了67%,并支持超过36个“端到端”的单精度FFT GFLOP。要实现如此高性能,就需要新颖的方案来进行大规模FFT分解,数据处理。排列和全部交换以及缓冲区设计,以最大程度地利用本地存储,同时最大程度地减少通信开销。本文的一个重要发现是,在这种架构上的大规模FFT表现为与其他架构完全不同的数据传输界限。本文的一个重要贡献是,对于算法的每个主要组成部分,我们探索了许多可能的设计选项并提出了定量的性能比较。这提供了超出特定体系结构的价值,因为它说明了与异构多核系统的不同通信范例和机制相关的基本特征。当今的计算机系统越来越多地被设计为包括通用加速器。本文中的技术也可以应用于这些体系结构,尤其是当它们的本地存储空间有限或陡峭的缓存层次结构时。我们还提供了将本文中的技术应用于类似体系结构的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号