Reducing communication costs can significantly improve the execution time of a parallel program. This paper presents a new approach for communication optimization in data parallel programs that is based on global data flow analysis and performance prediction. Our techniques are based on simple yet highly effective data flow equations which are solved iteratively for arbitrary control flow graphs. Previous techniques are based on fixed communication optimization strategies whose quality can be very sensible to changes of problem and machine sizes. Our algorithm is novel in that we carefully examine tradeoffs between communication latency hiding and reducing the number and volume of messages (e.g. message coalescing and aggregating) by systematically evaluating a reasonable set of promising communication placements for a given program covering several (possibly conflicting) communication guiding profit motives. P~3T, a state-of-the-art performance estimator that carefully models problem and machine characteristics, is used to ensure communication buffer-safety and to find the best communication placement of all created ones. Employing an accurate performance estimator opens ground for more aggressive communication optimization opportunities that carefully examine performance gains and losses among applicable optimization strategies which is not achievable with most existing approaches. Our techniques are based on data flow analyses that enable vectorizing, coalescing and aggregating communication, and overlapping communication with computation both within and across loop nests all in a unified framework. We present results for the Meiko CS-2 and a network of Sparc workstations based on a preliminary implementation which show that our method implies a significant reduction in communication costs and demonstrate the effectiveness of this analysis in improving the overall performance of data parallel programs.
展开▼