By exploiting the inherent parallelism in digitalsignal processing algorithms, significant savings in area and power consumption may be achieved. Completely parallel computation can lead to excessive area, thus mapping the algorithm onto reduced computational resources becomes beneficial. As a drawback, data interconnections become more complex and require storage in order to maintain computationally correct processing. We have proposed a systematic design methodology for managing data interconnections called stride permutations. These stride permutations are found in several algorithms, including fast Fourier transforms and Viterbi decoding. The proposed methodology leads to regular and scalable permutation networks which support power-of-two strides. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. In this paper, the proposed networks are evaluated in terms of control, area, power consumption, and timing.
展开▼