首页> 外文会议>Fault-Tolerant Computing, 1995. FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on >A recoverable distributed shared memory integrating coherence andrecoverability
【24h】

A recoverable distributed shared memory integrating coherence andrecoverability

机译:整合了一致性和可恢复性的可恢复分布式共享内存可恢复性

获取原文

摘要

Large-scale distributed systems are very attractive for theexecution of parallel applications requiring a huge computing power.However, their high probability of site failure is unacceptable,especially for long time running applications. In this paper, we addressthis problem and propose a checkpointing mechanism relying on arecoverable distributed shared memory (DSM) in order to tolerate singlenode failures. Although most recoverable DSMs require specific hardwareto store recovery data, our scheme uses standard memories to store bothcurrent and recovery data. Moreover, the management of recovery data ismerged with the management of current data by extending the DSM'scoherence protocol. This approach takes advantage of the datareplication provided by a DSM in order to limit the amount oftransferred pages during the checkpointing. The paper also presents animplementation and a preliminary performance evaluation of ourrecoverable DSM on a 56-node Intel Paragon
机译:大型分布式系统对此非常有吸引力 执行需要巨大计算能力的并行应用。 然而,它们的现场失败可能性是不可接受的, 特别是对于长期运行应用程序。在本文中,我们地址 这个问题并提出了依赖于一个检查点的机制 可恢复的分布式共享内存(DSM),以容忍单个 节点故障。虽然大多数可恢复的DSM都需要特定的硬件 要存储恢复数据,我们的方案使用标准存储器来存储两者 当前和恢复数据。此外,恢复数据的管理是 通过扩展DSM的管理来与当前数据的管理合并 一致性协议。这种方法利用了数据 DSM提供的复制以限制金额 在检查点期间转移页面。本文还提供了一个 实施和初步绩效评估 在56节点Intel Paragon上可恢复的DSM

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号