首页> 外文期刊>IEICE Transactions on Information and Systems >Understanding the Impact of BPRAM on Incremental Checkpoint
【24h】

Understanding the Impact of BPRAM on Incremental Checkpoint

机译:了解BPRAM对增量检查点的影响

获取原文
获取原文并翻译 | 示例
       

摘要

Existing large-scale systems suffer from various hardware/software failures, motivating the research of fault-tolerance techniques. Checkpoint-restart techniques are widely applied fault-tolerance approaches, especially in scientific computing systems. However, the overhead of checkpoint largely influences the overall system performance. Recently, the emerging byte-addressable, persistent memory technologies, such as phase change memory (PCM), make it possible to implement checkpointing in arbitrary data granularity. However, the impact of data granularity on the checkpointing cost has not been fully addressed. In this paper, we investigate how data granularity influences the performance of a checkpoint system. Further, we design and implement a high-performance checkpoint system named AG-ckpt. AG-ckpt is a hybrid-granularity incremental checkpointing scheme through: (1) low-cost modified-memory detection and (2) fine-grained memory duplication. Moreover, we also formulize the performance-granularity relationship of checkpointing systems through a mathematical model, and further obtain the optimum solutions. We conduct the experiments through several typical benchmarks to verify the performance gain of our design. Compared to conventional incremental checkpoint, our results show that AG-ckpt can reduce checkpoint data amount up to 50% and provide a speedup of 1.2x-1,3x on checkpoint efficiency.
机译:现有的大型系统遭受各种硬件/软件故障的困扰,从而推动了容错技术的研究。检查点重启技术是广泛应用的容错方法,尤其是在科学计算系统中。但是,检查点的开销在很大程度上影响了整个系统的性能。最近,新兴的字节可寻址的持久性存储技术(例如相变存储器(PCM))使以任意数据粒度实现检查点成为可能。但是,数据粒度对检查点成本的影响尚未完全解决。在本文中,我们研究了数据粒度如何影响检查点系统的性能。此外,我们设计并实现了一个名为AG-ckpt的高性能检查点系统。 AG-ckpt是一种混合粒度增量检查点方案,它通过:(1)低成本修改内存检测和(2)细粒度内存复制。此外,我们还通过数学模型建立了检查点系统的性能-粒度关系,并进一步获得了最优解。我们通过几个典型的基准进行实验,以验证设计的性能增益。与传统的增量检查点相比,我们的结果表明,AG-ckpt可以减少多达50%的检查点数据量,并可以将检查点效率提高1.2倍至1.3倍。

著录项

  • 来源
    《IEICE Transactions on Information and Systems》 |2013年第3期|663-672|共10页
  • 作者单位

    The authors are with National University of Defence Technology, Chang sha, China;

    The authors are with National University of Defence Technology, Chang sha, China;

    The authors are with National University of Defence Technology, Chang sha, China;

    The authors are with National University of Defence Technology, Chang sha, China;

    The authors are with National University of Defence Technology, Chang sha, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    fault tolerance; incremental checkpoint; BPRAM; large scale system;

    机译:容错增量检查点;BPRAM;大型系统;
  • 入库时间 2022-08-18 00:25:56

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号