首页> 外文会议>International Conference on High Performance Computing Simulation >A Selective and Incremental Backup Scheme for Task Pools
【24h】

A Selective and Incremental Backup Scheme for Task Pools

机译:任务池的选择性增量备份方案

获取原文

摘要

Checkpointing is a common approach to prevent loss of a program's state after permanent node failures. When it is performed on application-level, less data need to be saved. This paper suggests an uncoordinated application-level checkpointing technique for task pools. It selectively and incrementally saves only those tasks that have stayed in the pool during some period of time and that have not been saved before. The checkpoints are held in a resilient in-memory data store. Our technique applies to any task pool variant in which workers operate at the top of local pools, and work stealing operates at the bottom. Furthermore, the tasks must be free of side effects, and the final result must be calculated by reduction from individual task results. We implemented the technique for the lifeline-based global load balancing variant of task pools. This variant couples random victim selection with an overlay graph for termination detection. A fault-tolerant realization already exists in the form of a Java library, called JFT_GLB. It uses the APGAS and Hazelcast libraries underneath. Our implementation modifies JFT_GLB by replacing its nonselective checkpointing scheme with our new one. In experiments, we compared the overhead of the new scheme to that of JFT_GLB, with UTS, BC and two synthetic benchmarks. The new scheme required slightly more running time when local pools were small, and paid off otherwise.
机译:检查点是防止永久节点故障后丢失程序状态的常用方法。在应用程序级别执行时,需要保存的数据更少。本文提出了一种针对任务池的不协调的应用程序级检查点技术。它有选择地和增量地仅保存在一段时间内留在池中且之前未保存的那些任务。检查点保存在弹性内存数据存储中。我们的技术适用于任何工作池变体,在这些变体中,工人在本地池的顶部进行操作,而偷工在底部进行。此外,任务必须没有副作用,并且最终结果必须通过减去单个任务结果来计算。我们针对任务池的基于生命线的全局负载平衡变体实施了该技术。此变体将随机受害者选择与覆盖图结合起来用于终止检测。容错实现已经以称为JFT_GLB的Java库的形式存在。它使用下面的APGAS和Hazelcast库。我们的实现通过用新的JFT_GLB替换其非选择性检查点方案来修改JFT_GLB。在实验中,我们将新方案的开销与JFT_GLB的开销(使用UTS,BC和两个综合基准)进行了比较。当本地池较小时,新方案需要稍长的运行时间,否则将获得回报。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号