...
首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >Hybrid Checkpointing Using Emerging Nonvolatile Memories for Future Exascale Systems
【24h】

Hybrid Checkpointing Using Emerging Nonvolatile Memories for Future Exascale Systems

机译:使用新兴的非易失性存储器进行未来Exascale系统的混合检查点

获取原文
获取原文并翻译 | 示例

摘要

The scalability of future Massively Parallel Processing (MPP) systems is being severely challenged by high failure rates. Current centralized Hard Disk Drive (HDD) checkpointing results in overhead of 25% or more at petascale. Since systems become more vulnerable as the node count keeps increasing, novel techniques that enable fast and frequent checkpointing are critical to the future exascale system implementation. In this work, we first introduce one of the emerging nonvolatile memory technologies, Phase-Change Random Access Memory (PCRAM), as a proper candidate of the fast checkpointing device. After a thorough analysis of MPP systems, failure rates and failure sources, we propose a PCRAM-based hybrid local/global checkpointing mechanism which not only provides a faster checkpoint storage, but also boosts the effectiveness of other orthogonal techniques such as incremental checkpointing and background checkpointing. Three variant implementations of the PCRAM-based hybrid checkpointing are designed to be adopted at different stages and to offer a smooth transition from the conventional in-disk checkpointing to the instant in-memory approach. Analyzing the overhead by using a hybrid checkpointing performance model, we show the proposed approach only incurs less than 3% performance overhead on a projected exascale system.
机译:高故障率严重挑战了未来的大规模并行处理(MPP)系统的可伸缩性。当前的集中式硬盘驱动器(HDD)检查点在petascale上的开销将达到25%或更多。由于随着节点数量的不断增加,系统变得越来越脆弱,因此能够快速频繁地进行检查点的新技术对于未来的亿亿次系统实施至关重要。在这项工作中,我们首先介绍一种新兴的非易失性存储技术,即相变随机存取存储器(PCRAM),作为快速检查点设备的合适候选者。在对MPP系统,故障率和故障源进行全面分析之后,我们提出了一种基于PCRAM的混合本地/全局检查点机制,该机制不仅可以提供更快的检查点存储,还可以提高其他正交技术(例如增量检查点和背景)的有效性。检查点。基于PCRAM的混合检查点的三种变体实现被设计为在不同阶段采用,并提供从传统的磁盘内检查点到即时内存方法的平稳过渡。通过使用混合检查点性能模型来分析开销,我们显示了所提出的方法在预计的亿亿级系统上仅产生不到3%的性能开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号