We propose an efficient algorithm to implement the 3D NAS FFT benchmark. The proposed algorithm overlaps the communication with the computation. On parallel machines supporting overlap of communication with computation, the proposed algorithm can outperform the non-overlapping version of this algorithm by a factor close to two.
展开▼
机译:我们提出了一种高效的算法来实现3D NAS FFT基准。所提出的算法与计算与通信重叠。在支持与计算重叠的并行机器上,所提出的算法可以将该算法的非重叠版本优于两个接近的算法。
展开▼