In this paper, we present a compiler optimization for recognizing patterns of collective communication at runtime in data-parallel languages that allow the dynamic data decomposition. It has a calculation time of the order O(m), and is appropriate for large numerical applications and massively parallel machines. The previous approach took O(N_0+-+n_m-1) time, where m is the number of dimension of an array and n_i is the array size on the i-th dimension. The new method can be used for data redistribution and intrinsic procedures, as well as data pre-fetch in parallelized loops.
展开▼