...
首页> 外文期刊>Parallel Computing >Energy profile of rollback-recovery strategies in high performance computing
【24h】

Energy profile of rollback-recovery strategies in high performance computing

机译:高性能计算中回滚恢复策略的能量分布

获取原文
获取原文并翻译 | 示例
           

摘要

Extreme-scale computing is set to provide the infrastructure for the advances and breakthroughs that will solve some of the hardest problems in science and engineering. However, resilience and energy concerns loom as two of the major challenges for machines at that scale. The number of components that will be assembled in the supercomputers plays a fundamental role in these challenges. First, a large number of parts will substantially increase the failure rate of the system compared to the failure frequency of current machines. Second, those components have to fit within the power envelope of the installation and keep the energy consumption within operational margins. Extreme-scale machines will have to incorporate fault tolerance mechanisms and honor the energy and power restrictions. Therefore, it is essential to understand how fault tolerance and energy consumption interplay. This paper presents a comparative evaluation and analysis of energy consumption of three different rollback-recovery protocols: checkpoint/restart, message logging and parallel recovery. Our experimental evaluation shows parallel recovery has the minimum execution time and energy consumption. Additionally, we present an analytical model that projects parallel recovery can reduce energy consumption more than 37% compared to checkpoint/restart at extreme scale.
机译:极端规模的计算将为进步和突破提供基础设施,以解决科学和工程学中最棘手的问题。然而,对于这种规模的机器而言,弹性和能源问题已成为两大挑战。在这些挑战中,将要在超级计算机中组装的组件数量起着根本性的作用。首先,与当前机器的故障频率相比,大量零件将大大增加系统的故障率。其次,这些组件必须安装在设备的功率范围内,并将能耗保持在可操作范围内。极端规模的机器将必须结合容错机制并遵守能量和功率限制。因此,必须了解容错和能耗之间的相互作用。本文对三种不同的回滚恢复协议(检查点/重新启动,消息记录和并行恢复)的能耗进行了比较评估和分析。我们的实验评估表明,并行恢复具有最短的执行时间和最低的能耗。此外,我们提供了一个分析模型,该模型预测并行恢复与极端规模的检查点/重启相比,可以减少超过37%的能耗。

著录项

  • 来源
    《Parallel Computing》 |2014年第9期|536-547|共12页
  • 作者单位

    Parallel Programming Laboratory, Department of Computer Science. University of Illinois at Urbana-Champaign, United States;

    Parallel Programming Laboratory, Department of Computer Science. University of Illinois at Urbana-Champaign, United States;

    Parallel Programming Laboratory, Department of Computer Science. University of Illinois at Urbana-Champaign, United States;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Rollback-recovery; Checkpoint/restart; Message logging; Parallel recovery; Energy consumption;

    机译:回滚恢复;检查点/重启;消息记录;并行恢复;能源消耗;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号