首页> 外文期刊>Journal of computational science >Computation-communication overlap and parameter auto-tuning for scalable parallel 3-D FFT
【24h】

Computation-communication overlap and parameter auto-tuning for scalable parallel 3-D FFT

机译:计算-通信重叠和参数自整定可扩展的并行3-D FFT

获取原文
获取原文并翻译 | 示例

摘要

Parallel 3-D FFT is widely used in scientific applications, therefore it is important to achieve high performance on large-scale systems with many thousands of computing cores. This paper describes a new method for scalable high-performance parallel 3-D FFT. We use a 2-D decomposition of 3-D arrays to increase scaling to a large number of cores. In order to achieve high performance, we use non-blocking MPI all-to-all operations and exploit computation-communication overlap. We also auto-tune our 3-D FFT code efficiently in a large parameter space and cope with the complex trade-off in optimizing our code in various system environments. According to experimental results from two systems, our, method computes parallel 3-D FFT significantly faster than three existing libraries, and scales well to at least 32,768 compute cores. (C) 2015 Elsevier B.V. All rights reserved.
机译:并行3-D FFT在科学应用中得到了广泛使用,因此,在具有成千上万个计算核心的大规模系统上实现高性能非常重要。本文介绍了一种可扩展的高性能并行3-D FFT的新方法。我们使用3-D数组的2-D分解来增加对大量核心的缩放比例。为了实现高性能,我们使用非阻塞MPI所有到所有操作并利用计算通信重叠。我们还可以在较大的参数空间中有效地自动调整3-D FFT代码,并应对在各种系统环境中优化代码的复杂权衡。根据两个系统的实验结果,我们的方法计算并行3-D FFT的速度明显快于三个现有库,并且可以很好地扩展到至少32,768个计算核。 (C)2015 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号