首页> 外文会议>Reliable Distributed Systems, 1994. Proceedings., 13th Symposium on >Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers
【24h】

Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers

机译:分布式共享内存多计算机的协调检查点-回滚错误恢复

获取原文

摘要

Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require unnecessarily high checkpointing frequency and checkpoint traffic, which are sensitive to the frequency of interprocess communication in the applications. For message-passing systems, low overhead error recovery based on coordinated checkpointing allows the frequency of checkpointing to be determined only by the reliability requirements of the application. Efficient adaptation of this approach to DSM multicomputers is complicated by the absence of explicit messages in DSM systems, the presence of a shared and partially replicated address space, and the presence of a distributed coherency directory. We present solutions to these issues, and propose an error recovery scheme based on coordinated checkpointing and rollback for DSM multicomputers. Our performance evaluation based on trace-driven simulations indicates that this scheme incurs less checkpoint traffic than recovery schemes previously proposed for DSM systems.
机译:已为分布式共享内存(DSM)系统提出的大多数恢复方案都需要不必要的高检查点频率和检查点流量,这对应用程序中进程间通信的频率很敏感。对于消息传递系统,基于协作检查点的低开销错误恢复允许仅根据应用程序的可靠性要求确定检查点的频率。由于DSM系统中没有显式消息,共享和部分复制的地址空间以及分布式一致性目录的存在,使该方法对DSM多计算机的有效适应变得复杂。我们提出了针对这些问题的解决方案,并针对DSM多计算机提出了一种基于协调检查点和回滚的错误恢复方案。我们基于跟踪驱动模拟的性能评估表明,与先前针对DSM系统提出的恢复方案相比,该方案产生的检查点流量更少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号