Fault-tolerant distributed shared memory systems do not always need to support a complete and consistent recovery after a failure. We describe a framework, within which different approaches to, and different degrees of consistency and recoverability can be understood. The addition of consistent failure recovery may be approached from two different viewpoints: either by an application-oriented view or a memory-oriented view. The major characteristics used in our framework are variations of availability, consistency, and application support. The paper explains the basic model, which is used in RELIABLE MIRAGE+, and describes how the framework can be used by other researchers to understand and classify solutions to the reliable DSM problem. The model distinguishes a recoverable system, which must be able to survive any single-site failure, from a reliable system which also ensures consistency after the recovery. Since consistency requirements may impose a high penalty on standard operational performance, various relaxed recoverability consistencies are described by the multi-level model. Recovery under this model may be accomplished by applications specifying consistency and availability requirements.
展开▼