Run-time data redistribution can affect algorithm performance in distributed-memory machines. Redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Additionally, data redistribution can occur at subprogram boundaries. Redistribution, however, represents increased program overhead as algorithm computation is necessarily discontinued while data are exchanged among processor memories. In this paper, we present a technique for data-processor mapping, applicable to data redistribution, that minimizes the total amount of data that must be communicated among processors. The mapping technique is architecture-independent and represents our initial work toward achieving efficient redistribution in distributed-memory machines.
展开▼