首页> 外文会议>International Euro-Par Parallel Processing Conference; 20050830-0902; Lisbon(PT) >Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer
【24h】

Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer

机译:Blue Gene / L超级计算机上3D FFT的性能测量

获取原文
获取原文并翻译 | 示例

摘要

This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE). We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 x 128 x 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128 x 128 x 128 complex FFT on 2048 nodes.
机译:本文介绍了在Blue Gene / L架构上运行的,通信密集型内核(复杂数据3D FFT)的性能特征。表征了体积FFT算法的两种实现,一种基于MPI库,使用优化的集体所有操作,另一种基于Blue Gene / L Advanced Diagnostics Environment(SPI)的低级系统编程接口(SPI)。 BG / L ADE)。我们将当前结果与使用参考MPI实现(将MPICH2通过未优化的集合移植到BG / L)和FFTW库的版本2.1.5的结果进行比较。在Blue Gene / L原型上进行的性能实验表明,我们的两种实现方式都可以很好地扩展,并且当前基于MPI的实现方式在2048个节点上对大小为128 x 128 x 128的3D FFT可以实现730的加速。此外,体积FFT的性能优于FFTW在2048个节点上进行128 x 128 x 128复数FFT时,端口数量减少8。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号