首页> 外文会议>IEEE International Conference on High Performance Computing and Communications >Marriage Between Coordinated and Uncoordinated Checkpointing for the Exascale Era
【24h】

Marriage Between Coordinated and Uncoordinated Checkpointing for the Exascale Era

机译:Exastale时代协调与未计算检查点之间的婚姻

获取原文

摘要

The state-of-the-art checkpointing techniques are projected to be prohibitively expensive in the Exascale era. These techniques are most often holistic in nature which prevents them to leverage programming model and paradigm specific advantages so as to be viable for the Exascale era. In this work, we present a unified non-hierarchical model to combine uncoordinated checkpointing with coordinated system-wide checkpointing to capitalize on programming model specific advantages. We develop closed-form formulas for performance improvement and the optimal checkpoint interval of the unified model in our analytical assessment. As an instantiation of our model, we propose to unify task-level checkpointing with a system-wide checkpointing scheme for task-parallel HPC applications. This instantiation has three distinct advantages: first it reduces performance overheads by decreasing the frequency of checkpoints in the unified system, second it features fast failure recovery by using in-memory task-local checkpoints instead of on-disk global checkpoints, and third it does not compromise from the high failure coverage typical of system-wide checkpointing.
机译:最先进的检查点技术在ExaScale时代预计将在exaScale时代昂贵。这些技术通常是整体性质,这可以防止它们利用编程模型和范例特殊的优势,以便对ExaScale时代可行。在这项工作中,我们展示了一个统一的非分层模型,以将不协调的检查点与协调系统宽检查点相结合,以利用规范模型特定优势。我们开发闭合式公式,以进行绩效改进,以及在我们的分析评估中的统一模型的最佳检查点间隔。作为我们模型的实例化,我们建议统一任务级别检查点,并使用系统宽的检查点检查任务 - 并行HPC应用程序。此实例化具有三种不同的优点:首先,通过减少统一系统中的检查点频率,降低性能开销,其特征在于通过使用内存任务 - 本地检查点而不是磁盘全局检查点,以及第三个功能从系统范围内检查点的典型典型的高故障覆盖率不妥协。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号