【24h】

A Case Study of Application Structure Aware Resilience Through Differentiated State Saving and Recovery

机译:通过差异化状态保存和恢复的应用程序结构感知弹性的案例研究

获取原文

摘要

Resilience is a growing concern for large-scale simulations. As failures become more frequent, alternatives to global checkpointing that limit the extent of needed recovery become more desirable. Additionally, platforms will differ in both error rates and types, therefore, a flexible and customizable recovery strategy will be extremely helpful to the applications running on these platforms. Applications often have structures that provide logical confinement spaces that can be exploited for this purpose. We investigate a customizable recovery strategy using Chombo, a structured adaptive mesh refinement (SAMR) library, as a case study. We exploit the inherent granularities and hierarchy in SAMR to limit the impact of faults for localized recovery, and identify tunable parameters for customizing the strategy depending upon the application and platform behavior. We use Global View Resilience (GVR) library, which provides global versioning arrays for application-controlled state saving as our resiliency interface.
机译:弹性对于大型仿真越来越重要。随着故障变得更加频繁,限制所需恢复程度的全局检查点的替代方案变得更加可取。此外,平台的错误率和类型都会有所不同,因此,灵活且可自定义的恢复策略将对在这些平台上运行的应用程序非常有帮助。应用程序通常具有提供逻辑限制空间的结构,可以将其用于此目的。我们将使用Chombo(结构化自适应网格优化(SAMR)库)研究可定制的恢复策略,作为案例研究。我们利用SAMR中固有的粒度和层次结构来限制故障对本地恢复的影响,并根据应用程序和平台的行为确定用于自定义策略的可调参数。我们使用Global View Resilience(GVR)库,该库提供全局版本控制阵列作为应用程序控制状态保存为我们的弹性接口。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号