首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >A Tunable Holistic Resiliency Approach for High-Performance Computing Systems
【24h】

A Tunable Holistic Resiliency Approach for High-Performance Computing Systems

机译:高性能计算系统的可调整体弹性方法

获取原文
获取原文并翻译 | 示例
           

摘要

In order to address anticipated high failure rates, resiliency characteristics have become an urgent priority for next-generation extreme-scale high-performance computing (HPC) systems. This poster describes our past and ongoing efforts in novel fault resilience technologies for HPC. Presented work includes proactive fault resilience techniques, system and application reliability models and analyses, failure prediction, transparent process- and virtual-machine-level migration, and trade-off models for combining preemptive migration with checkpoint/restart. This poster summarizes our work and puts all individual technologies into context with a proposed holistic fault resilience framework.
机译:为了解决预期的高故障率,弹性特性已成为下一代超大规模高性能计算(HPC)系统的当务之急。该海报描述了我们在HPC新型故障恢复技术上的过去和正在进行的努力。提出的工作包括主动的故障恢复技术,系统和应用程序可靠性模型和分析,故障预测,透明的流程和虚拟机级别迁移以及将抢先迁移与检查点/重新启动相结合的权衡模型。该海报总结了我们的工作,并通过提议的整体故障恢复框架将所有单独的技术置于背景之中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号