Minimizing time spent on communications when mapping affine loop nests onto distributed memory parallel computers (DMPCs) is a key problem with regard to performance, and many authors have dealt with it. All communications are not equivalent. Local communications (translations), simple communications (horizontal or vertical ones) or structured communications (broadcasts, gathers, scatters, or reductions) are performed much faster than general affine communications onto DMPCs. Dion, Randriamaro and Robert have presented an heuristic based on the follwing strategy: (1) zero out as many nonlocal communications as possible, (2) as it is generally impossible to obtain a communication local mapping, try to optimize residual communications. The aim of this paper is to present an evaluation of the heuristic given by Dion, Randriamaro and Robert. First, we recall the motivations of their approach and we evaluate its efficiency on classical linear algebra examples.
展开▼