首页> 外文会议>International Parallel and Distributed Processing Symposium >Stable Checkpointing in Distributed Systems without Shared Disks
【24h】

Stable Checkpointing in Distributed Systems without Shared Disks

机译:在没有共享磁盘的分布式系统中稳定检查点

获取原文

摘要

Interacting processes in distributed systems save their checkpoints on local disks for efficiency reasons. But, because local checkpoints get unavailable with failing hosts, redundancy schemes similar to RAID-like storage schemes have to be used. In such systems, checkpoints are stable under a particular fault model because they can get reconstructed in the distributed system. In this paper, two variants of stable checkpoint storage will be compared, (i) parity grouping over local checkpoints and (ii) RAID-like distribution of each checkpoint using a software based distributed storage system. An analysis is given to compare costs for collective checkpoint creation, recovery of a single process and rollback of all processes. The results show that despite of differences in detail, checkpointing using a distributed storage system is a reasonable solution.
机译:分布式系统中的交互过程在本地磁盘上保存其检查点以获得效率原因。但是,因为本地检查点因失败主机而无法使用,因此必须使用与RAID样存储方案类似的冗余方案。在这样的系统中,检查点在特定故障模型下稳定,因为它们可以在分布式系统中重建。在本文中,将进行比较稳定检查点存储的两个变体,(i)通过基于软件的分布式存储系统对本地检查点上的奇偶校验分组和(ii)的RAID样分布。给出了分析来比较集体检查点创建的成本,恢复单个过程和所有进程的回滚。结果表明,尽管详细差异,但使用分布式存储系统的检查点是合理的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号