We present a hardware-algorithm for sorting N elements using either a p-sorter or a sorting network of fixed I/O size p while strictly enforcing conflict-free memory accesses. To the best of our knowledge, this is the first realistic design that achieves optimal time performance, running in /spl Theta/(NlogN/plogp) time for all ranges of N. Our result completely resolves the problem of designing an implementable, time-optimal algorithm for sorting N elements using a p-sorter. More importantly, however, our result shows that, in order to achieve optimal time performance, all that is needed is a sorting network of depth O(log/sup 2/p) such as, for example, Batcher's classic bitonic sorting network.
展开▼