The advanced computer and network technologies have lead to the development of distributed systems. Here, an application is realized by multiple processes computing and communicating by exchanging messages through communication channels. Some mission-critical applications are required to be executed fault-tolerantly. One important method for fault-tolerance is checkpoint-recovery. For restarting execution correctly after recovery, a set of checkpoints taken by all the processes should form a consistent global checkpoint.
展开▼