This paper proposes and evaluates an integrated recoverable distributed shared memory (IRDSM) that integrates coherency control of distributed shared memory with control of checkpoint/recovery where a workstation cluster is used as a distributed environment. Copies on a distributed shared memory (DSM) system that allow multiple readers to access the same data simultaneously are used as replicas for recovery. This integration reduces data transfers for checkpoint/recovery. Replication of data without a copy is performed lazily because a future access to the data may make a copy and hide the overhead of replication of the data for recovery. The lazy replication also utilizes the differences between a copy and a replica in order to reduce data to be transmitted. An evaluation using programs of the SPLASH parallel benchmark suite is shown.
展开▼