首页> 外文会议>International Symposium on Advanced Parallel Processing Technologies >Research on Optimum Checkpoint Interval for Hybrid Fault Tolerance
【24h】

Research on Optimum Checkpoint Interval for Hybrid Fault Tolerance

机译:混合容差的最优检查点间隔研究

获取原文
获取外文期刊封面目录资料

摘要

With the rapid growth of the high performance computer system size and complexity, passive fault tolerance can no longer effectively provide reliability of the system because of the high overhead and poor scalability of these methods. Hybrid fault tolerant method which is the combination of passive and active fault tolerant approaches has the potential to be widely used in fault tolerance of exascale system. However, there are still many issues of this method need to be ironed out. This paper focuses on the issues of checkpointing of hybrid fault tolerant method. A common question surrounding checkpointing is the optimization of the checkpoint interval. This paper proposes two models to model the systems which adopt hybrid fault tolerance. By comparing their results with the simulation, this paper evaluates the effectiveness of these two models. Experimental result shows that the modified model can not only predict the total work time excellently, but also can predict the optimum checkpoint interval precisely.
机译:随着高性能计算机系统尺寸和复杂性的快速增长,由于高开销和这些方法的可扩展性差,无源容错能够有效地提供系统的可靠性。混合容错方法是被动和有源容差接近的组合具有广泛应用于Exascale系统的容错能力。但是,这种方法仍有许多问题需要熨烫。本文重点介绍了混合容错方法检查点的问题。围绕检查点的常见问题是检查点间隔的优化。本文提出了两个模型来模拟采用混合容错的系统。通过将它们的结果与模拟进行比较,本文评估了这两种模型的有效性。实验结果表明,修改模型不仅可以高效地预测总工作时间,而且可以精确地预测最佳检查点间隔。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号