首页> 外文期刊>Dependable and Secure Computing, IEEE Transactions on >A New Diskless Checkpointing Approach for Multiple Processor Failures
【24h】

A New Diskless Checkpointing Approach for Multiple Processor Failures

机译:解决多处理器故障的新无盘检查点方法

获取原文
获取原文并翻译 | 示例

摘要

Diskless checkpointing is an important technique for performing fault tolerance in distributed or parallel computing systems. This study proposes a new approach to enhance neighbor-based diskless checkpointing to tolerate multiple failures using simple checkpointing and failure recovery operations, without relying on dedicated checkpoint processors. In this scheme, each processor saves its checkpoints in a set of peer processors, called checkpoint storage nodes. In return, each processor uses simple XOR operations to store a collection of checkpoints for the processors for which it is a checkpoint storage node. This study defines the concept of safe recovery criterion, which specifies the requirement for ensuring that any failed processor can be recovered in a single step using the checkpoint data stored at one of the surviving processors, as long as no more than a given number of failures occur. This study further identifies the necessary and sufficient conditions for satisfying the safe recovery criterion and presents a method for designing checkpoint storage node sets that meet these requirements. The proposed scheme allows failure recovery to be performed in a distributed manner using XOR operations.
机译:无盘检查点是用于在分布式或并行计算系统中执行容错的一项重要技术。这项研究提出了一种新方法,该方法可以增强基于邻居的无盘检查点,以使用简单的检查点和故障恢复操作来容忍多个故障,而无需依赖专用的检查点处理器。在此方案中,每个处理器将其检查点保存在一组称为检查点存储节点的对等处理器中。作为回报,每个处理器使用简单的XOR操作来存储作为其检查点存储节点的处理器的检查点集合。这项研究定义了安全恢复标准的概念,该标准规定了确保不超过给定数量的故障,就可以使用存储在其中一个幸存处理器中的检查点数据在一个步骤中恢复任何故障处理器的要求。发生。这项研究进一步确定了满足安全恢复标准的必要条件和充分条件,并提出了一种设计满足这些要求的检查点存储节点集的方法。提出的方案允许使用XOR操作以分布式方式执行故障恢复。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号