We analyze the impact of message-ordering, between outgoing messages from a sender to multiple receivers (called multicasts), on the completion time of a program for wormhole-routed distributed-memory systems. In most existing systems, messages in a multicast are generally being sent as separate unicast messages by the source processor itself. We study how best to order a set of outgoing messages by taking into account message criticality and architectural issues including link contention, multiple ports and adaptivity in routing. First, the simple algorithm of (Dikaiakos et al., 1992) is extended to obtain a static algorithm for nonfully-connected systems. Next, a dynamic message-ordering algorithm is proposed which works for any number of ports and takes advantage of routing adaptivity. Simulation results on random task graphs show improvement in completion time by 34% for static and 44% for dynamic, over naive sequential message-ordering.
展开▼