We benchmark basic data parallel communications and sorting on the CM-5 using C~*. Our results indicate that: a) reduction, prefix, common permutation and broadcasting on virtual processors are very efficient, b) permutations and send/get operations are relatively efficient, with a send being always faster than a get, and c) rank is the only expensive computation. For data parallel sorting, bitonic sorting is much faster than odd-even transposition; it is inferior to merge-bitonic sort only for very large data sizes and a very small range of data parallel shape sizes. We also show how simple communication routines and sorting can be used to implement efficiently parallel network simulation.
展开▼