The Shuffle/Exchange networks with redundant paths are interconnection networks that are designed to provide fault tolerance for high performance computing systems. The requirement for using the redundant paths either for random access or permutation routing entails the identification of faults in the current active paths so that the redundant paths can be used to avoid the faults. So far, none of the work has provided any detailed mechanism regarding how and when the redundant paths will be used. In this paper, a routing technique is described that can be used to avoid a single fault in the Omega-Plus network, which is a network with an extra switching stage to provide two paths between any processor memory pair. Then, the technique is extended to handle permutation routing under the presence of a single fault The proposed approach necessitates the use of periodic diagnosis and saving the system state at a checkpoint. Since the use of checkpoints will add overhead to the normal processing, the expected number of permutations to be performed and the optimal checkpoint interval are then derived for a sequence of P permutations.
展开▼