首页> 外文期刊>Parallel Computing >Parallel Fast Multipole Method accelerated FFT on HPC clusters
【24h】

Parallel Fast Multipole Method accelerated FFT on HPC clusters

机译:并行快速多极方法加速HPC集群的FFT

获取原文
获取原文并翻译 | 示例

摘要

With increasing sizes of distributed systems, there comes an increased risk of communication bottlenecks. In the past decade there has been a growing interest in communication-avoiding algorithms. The distributed memory Fast Fourier Transform is an important algorithm which suffers from major communication bottlenecks. In this work, we take a look at an existing communication-avoiding algorithm FMM-FFT, an alternative to FFT which utilizes the Fast Multipole Method (FMM) to reduce communications to a single all-to-all communication. We present a detailed implementation of FMM-FFT relying on modern libraries and demonstrate it on two distinct distributed memory architectures notably a traditional Intel Xeon based HPC cluster and then a Beowulf cluster. We show that while the FMM-FFT is significantly slower than FFT on the traditional HPC cluster, on the Beowulf cluster it outperforms standard FFT, consistently getting speedups of 1.5x or more against FFTW. We then proceed to show how the communication to computation cost metric is important and useful in explaining the performance results of FMM-FFT against standard FFT. The source code pertaining to this work is being made publicly available under a permissive open source licence at Github.
机译:随着分布式系统的增加,通信瓶颈的风险增加了。在过去的十年中,在避免避免算法中存在越来越多的兴趣。分布式存储器快速傅立叶变换是一种重要的算法,其遭受了主要的通信瓶颈。在这项工作中,我们看一下现有的通信避免算法FMM-FFT,一种用于FFT的替代方法,它利用快速多极方法(FMM)来减少单个全部通信的通信。我们介绍了依靠现代库的FMM-FFT的详细实施,并在两个不同的分布式内存架构上演示它特别是传统的英特尔Xeon基础的HPC群集,然后是Beowulf集群。我们认为,虽然FMM-FFT在传统HPC集群上的FFT显着慢,但在Beowulf集群上,它优于标准FFT,始终如一地获得1.5倍或更多的加速对抗FFTW。然后,我们继续展示与计算成本度量的通信是如何重要的,并且有助于解释与标准FFT的FMM-FFT的性能结果。与此工作有关的源代码在GitHub的允许开源许可下公开可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号