首页> 外文会议>IEEE International Parallel Distributed Processing Symposium >Adaptive Incremental Checkpointing via Delta Compression for Networked Multicore Systems
【24h】

Adaptive Incremental Checkpointing via Delta Compression for Networked Multicore Systems

机译:通过Cltta压缩网络化多核系统的自适应增量检查点

获取原文

摘要

Checkpointing has been widely adopted in support of fault-tolerance and job migration, with checkpoint files preferably kept also at remote storage to withstand unavailability/failures of local nodes in networked systems. Lately, I/O bandwidth to remote storage becomes the bottleneck for checkpointing on a large-scale system. This paper proposes an adaptive incremental checkpointing (AIC), aiming to reduce the checkpointing file size considerably so that its involved overhead is lowered and thus the expected job turnaround time drops. Given production multicore systems are observed to have unused cores often available, we design AIC to make use of separate cores for carrying out multi-level checkpointing with delta compression at desirable points of time adaptively. We develop a new Markov model for predicting the performance of such multi-level concurrent checkpointing, with AIC performance evaluated using six SPEC benchmarks under various system sizes. AIC is observed to lower the normalized expected turnaround time substantially (by up to 47%) when compared to its static counterpart and a recent multi-level checkpointing scheme with fixed checkpoint intervals.
机译:检查点已支持容错和工作移民被广泛采用,以最好也保持在远程存储承受联网系统本地节点的可用性/故障检查点文件。最近,I / O带宽到远程存储成为检查点大规模系统中的瓶颈。本文提出了一种自适应增量检查点(AIC),旨在减少的检查点的文件大小显着,使得其涉及的开销被降低,因此预期工作周转时间下降。鉴于生产多核系统中,观察到未使用的内核往往可用,我们设计AIC尽量使用分离芯在时间点希望与自适应增量压缩开展多层次的检查点。我们开发了新的马尔可夫模型预测这种多级并行检查点的性能,性能AIC下使用不同的系统大小6个SPEC基准测试评估。 AIC观察相比,其静态配对,并与固定的检查点的时间间隔最近的多级检查点方案时以基本上降低预期归一周转时间(高达47%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号