A highly reliable program execution environment which enables user programs to tolerate underlying hardware failures is presented. The approach is to run multiple copies of the user programs at the same time. As long as one copy survives, the user program can be completed successfully. In the meantime, the user interacts with the replicated program as if it were a normal program. The authors call this characteristic user transparent replication. In order to achieve user transparent replication, program replicas must behave consistently. Otherwise, users might get different queries or output from different running replicas. The authors identify the reasons why the inconsistent program execution problem occurs and propose algorithms to ensure that computation replicas behave consistently. With consistent running program replicas, a filter program can be easily constructed to delete duplicated I/O requests or duplicated output and thus achieve user transparency.
展开▼