We give a randomized algorithm for sorting strings in external memory. For K binary strings comprising N words in total, our algorithm finds the sorted order and the longest common prefix sequence of the strings using O(K/B log_(M/B)(K/M) log(N/K) + N/B) I/Os. This bound is never worse than O(K/B log_(M/B)(K/M) log log_(M/B)(K/M) + N/B) I/Os, and improves on the (deterministic) algorithm of Arge et al. (On sorting strings in external memory, STOC '97). The error probability of the algorithm can be chosen as O(N~(-c)) for any positive constant c. The algorithm even works in the cache-oblivious model under the tall cache assumption, i.e,, assuming M > B~(1+ε) for some ε > 0. An implication of our result is improved construction algorithms for external memory string dictionaries.
展开▼