The authors discuss the design of a software reconfiguration strategy for hypercube multicomputer architectures under multiple faults. The advantage of the strategy over previous schemes is that it requires no redundant hardware, but supports reconfiguration through graceful degradation. It is based on the notion of using multiple virtual processors on a single physical processor and using these virtual processors for work-load redistribution under faults. The authors describe an environment, developed on a commercially available Intel iPSC/2 hypercube multicomputer, for implementing the software-based fault tolerance scheme. Results of experiments performed with this environment on the performance degradation of application programs under hardware faults are described. The reconfiguration scheme shows low overhead at low cost, and even provides improved efficiency on a fault-free hypercube.
展开▼