Software-based hardware fault tolerance describes a class of techniques which allows software to detect and correct errors introduced by unreliable hardware. With the advent of many-core architectures, the already existing reliability issues, like temporal and structural variations or the sensitivity against soft-errors, are becoming an even more serious problem. Software-based hardware fault tolerance is able to provide cost-effective solutions. This presentation will point out the new opportunities and challenges for applying software-based hardware fault tolerance to emerging many-core architectures. We will discuss the tradeoff between the application of these techniques and the classical hardware-based fault tolerance in terms of fault coverage, overhead, and performance.
展开▼