首页> 外文会议>Advances in grid and pervasive computing >A Probabilistic Fault-Tolerant Recovery Mechanism for Task and Result Certification of Large-Scale Distributed Applications
【24h】

A Probabilistic Fault-Tolerant Recovery Mechanism for Task and Result Certification of Large-Scale Distributed Applications

机译:大型分布式应用程序任务和结果认证的概率容错恢复机制

获取原文
获取原文并翻译 | 示例

摘要

This paper deals with fault tolerant recovery mechanisms and probabilistic results certification issues on large scale architectures. The related works in the result certification domain are based on a total or a partial duplication of the application. However, they are limited to independent tasks executions. In the present work, we extend these mechanisms to dependant tasks applications. First of all we propose an approach, based on an abstract representation of a parallel execution called macro-dataflow graph. Second we introduce probabilistic certification algorithms that avoid the re-execution of the program, allowing for recovery on different platforms under different number of processors. We also sketch how to simulate our framework according to state of the art, modeling workloads and fault injection tools.
机译:本文讨论了大型体系结构上的容错恢复机制和概率结果认证问题。结果认证领域的相关工作基于申请的全部或部分重复。但是,它们仅限于独立的任务执行。在当前的工作中,我们将这些机制扩展到相关任务应用程序。首先,我们提出了一种基于并行执行的抽象表示的方法,称为宏数据流图。其次,我们引入了概率认证算法,该算法避免了程序的重新执行,从而允许在不同数量的处理器下的不同平台上进行恢复。我们还概述了如何根据最新技术,工作量建模和故障注入工具来模拟我们的框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号