Software reliability is becoming increasingly important as computer systems assume ever greater roles in our everyday life. This paper proposes a software-based redundant execution programming model for transient fault detection and correction. A multi-threading technique is introduced to handle thread-level redundant execution for fault detection, and majority voting is used to recover from errors. A watchdog thread is used to cope with no-response threads. Preliminary experiments for benchmark programs show that the proposed programming model can detect errors from transient faults and that the majority voting strategy can correctly resume program execution. Application of the proposed model will improve programs' fault tolerance.
展开▼