This thesis is about designing a distributed system that transparently [6] supports fault tolerance. In order to provide this, a lightweight language, the Fault Tolerant Distributed Language(ffdi), has been developed which supports the essential features required to enable fault tolerance. This language is the user interface into the runtime fault tolerant distributed architecture. By adopting a hybrid approach based on existing work in distributed systems, a model for distributed fault tolerant computation has been constructed based on distributed shared memory and communicating processes. The effectiveness of the model in the face of failure is measured. The manner in which the model deals with failures, the degradation of the system in the face of failures and the overhead associated with the fault tolerant components is explored. Overall, the model has successfully shown the requirements to build a transparent fault tolerant system.
展开▼