When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. The authors solve two facets of the problem of finding alignments that reduce residual communication, i.e., determining both the alignments that vary in loops, and the objects that should have replicated alignments. They show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and they provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. They also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. An algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication is proposed.
展开▼