首页> 外文会议>10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing >Selective Recovery from Failures in a Task Parallel Programming Model
【24h】

Selective Recovery from Failures in a Task Parallel Programming Model

机译:任务并行编程模型中的故障选择性恢复

获取原文
获取原文并翻译 | 示例

摘要

We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tracking mechanism. Compared with conventional checkpoint/restart techniques, this system offers a recovery penalty that is proportional to the degree of failure rather than the system size. We evaluate this system using the Self Consistent Field (SCF) kernel which forms an important component in ab initio methods for computational chemistry. Experimental results indicate that fault tolerant task pools are robust in the presence of an arbitrary number of failures and that they offer low overhead in the absence of faults.
机译:我们提出了一个容错任务池执行环境,该环境能够使用轻量级分布式任务完成跟踪机制执行细粒度的选择性重启。与传统的检查点/重新启动技术相比,此系统提供的恢复代价与故障程度成正比,而不与系统大小成正比。我们使用自洽场(SCF)内核评估该系统,该内核在计算化学的从头算方法中形成重要组成部分。实验结果表明,在出现任意数量的故障时,容错任务池是健壮的,并且在没有故障的情况下,它们的开销很小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号