We describe a novel practical parallel FFT scheme designed for SIMD systems and/or data parallel programming. A bit-exchange of elements between the processors is avoided by means of the `Transpose Algorithm'. Our transposition is based on the assignment of the data field onto a 1-dimensional ring of systolic cells which subsequently is mapped onto a ring of processors, realized as a subset of the system's connectivity. We have implemented and benchmarked a 2-dimensional parallel FFT code on the APE100/Quadrics parallel computer, where-due to a rigid next-neighbour connectivity and lack of local addressing--efficient FFT implementations could not be realized so far.
展开▼