In this paper, we propose high-performance paral-lel one-dimensional fast Fourier transform (FFT) algo-rithms for distributed-memory parallel computers with vector symmetric multiprocessor (SMP) nodes. The four-step FFT algorithm can be altered into a five-step FFT algorithm to expand the innermost loop length. We use the four-step and five-step algorithms to im-plement the parallel one-dimensional FFT algorithms. In our proposed parallel FFT algorithms, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order. Performance results of one-dimensinonal power-of-two FFE_s on a distributed-memory parallel computer with (pseudo)_ vector SMP nodes, HITACHI SR8000, are reported. We succeeded in obtaining performance of about 38 GFLOPS on a 16-node SR8000.
展开▼