首页> 外文会议>International Conference on Parallel and Distributed Computing >A Case Study of Application Structure Aware Resilience Through Differentiated State Saving and Recovery
【24h】

A Case Study of Application Structure Aware Resilience Through Differentiated State Saving and Recovery

机译:通过差异化状态储存应用结构意识到弹性的案例研究

获取原文

摘要

Resilience is a growing concern for large-scale simulations. As failures become more frequent, alternatives to global checkpointing that limit the extent of needed recovery become more desirable. Additionally, platforms will differ in both error rates and types, therefore, a flexible and customizable recovery strategy will be extremely helpful to the applications running on these platforms. Applications often have structures that provide logical confinement spaces that can be exploited for this purpose. We investigate a customizable recovery strategy using Chombo, a structured adaptive mesh refinement (SAMR) library, as a case study. We exploit the inherent granularities and hierarchy in SAMR to limit the impact of faults for localized recovery, and identify tunable parameters for customizing the strategy depending upon the application and platform behavior. We use Global View Resilience (GVR) library, which provides global versioning arrays for application-controlled state saving as our resiliency interface.
机译:弹性是大规模模拟的越来越令人担忧。由于失败变得更加频繁,全球检查点的替代方案限制所需恢复程度变得更为可取。此外,平台将在错误速率和类型中不同,因此,灵活可自定义的恢复策略对在这些平台上运行的应用程序非常有帮助。应用程序通常具有提供可以为此目的开发的逻辑限制空间的结构。我们使用Chombo调查可定制的恢复策略,这是一个结构化的自适应网格细化(SAMR)库作为案例研究。我们利用SAMR中固有的粒度和层次结构来限制故障对本地化恢复的影响,并确定可调谐参数,以根据应用程序和平台行为定制策略。我们使用全局视图弹性(GVR)库,为应用程序控制状态保存为我们的弹性接口提供全局版本控制阵列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号