首页> 外文期刊>IEE proceedings. Part E >Effective and concurrent checkpointing and recovery in distributed systems
【24h】

Effective and concurrent checkpointing and recovery in distributed systems

机译:分布式系统中的有效并发检查点和恢复

获取原文
获取原文并翻译 | 示例
           

摘要

The paper presents an effective application-transparent checkpointing/rollback scheme for multiple processes that communicate via message passing in a distributed system. The authors first propose a checkpointing scheme that uses the unforced checkpointing strategy and dynamically varies checkpoint intervals with respect to the frequency of message sending to reduce process rollback propagation. Additional forced checkpoints are taken only to achieve checkpoint consistency among processes and to avoid the domino effect. The authors then discuss both global rollback and minimal rollback approaches, and incorporate them into the proposed checkpointing scheme. The combined checkpointing/rollback scheme can handle out-of-order messages, achieve high concurrency during checkpointing/rollback operations, and allow multiple invocations of checkpointing/rollback instances. To reduce the space overhead a global recovery line determination approach to purge the checkpoints to which processes shall never is proposed. Experiences with event driven simulation indicate that the proposed scheme can effectively reduce rollback propagation, while incurring little control message overhead and maintaining at any time only a few checkpoints at each process.
机译:本文提出了一种有效的透明应用程序检查点/回滚方案,用于通过分布式系统中的消息传递进行通信的多个进程。作者首先提出一种检查点方案,该方案使用非强制检查点策略,并相对于消息发送频率动态地改变检查点间隔,以减少进程回滚传播。采取其他强制检查点仅是为了实现进程之间的检查点一致性并避免多米诺骨牌效应。然后作者讨论了全局回滚和最小回滚方法,并将它们合并到建议的检查点方案中。组合的检查点/回滚方案可以处理乱序消息,在检查点/回滚操作过程中实现高并发性,并允许多次调用检查点/回滚实例。为了减少空间开销,提出了一种全局恢复线确定方法,以清除进程永远不会访问的检查点。事件驱动模拟的经验表明,提出的方案可以有效地减少回滚传播,而几乎不产生控制消息开销,并且在任何时候在每个进程中仅维护几个检查点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号