首页> 外文会议>International conference on high performance computing, networking, storage and analysis 2009 >Supporting Fault-Tolerance for Time-Critical Events in Distributed Environments
【24h】

Supporting Fault-Tolerance for Time-Critical Events in Distributed Environments

机译:在分布式环境中支持对时间至关重要的事件的容错

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we consider the problem of supporting fault tolerance for adaptive and time-critical applications in heterogeneous and unreliable grid computing environments. Our goal for this class of applications is to optimize a user-specified benefit function while meeting the time deadline. Our first contribution in this paper is a multi-objective optimization algorithm for scheduling the application onto the most efficient and reliable resources. In this way, the processing can achieve the maximum benefit while also maximizing the success-rate, which is the probability of finishing execution without failures. However, for the cases where failures do occur, we have developed a hybrid failure-recovery scheme to ensure that the application can complete within the pre-specified time interval. Our experimental results show that our scheduling algorithm can achieve better benefit when compared to several heuristics-based greedy scheduling algorithms, while still having a negligible overhead. Benefit is further improved when we apply the hybrid failure recovery scheme, and the success-rate becomes 100%.
机译:在本文中,我们考虑了在异构和不可靠的网格计算环境中为自适应和时间关键型应用程序支持容错的问题。我们针对此类应用程序的目标是在满足时间期限的同时优化用户指定的福利功能。本文的第一个贡献是一种多目标优化算法,用于将应用程序调度到最高效,最可靠的资源上。以这种方式,该处理可以在获得最大利益的同时还使成功率最大化,这是完成执行而没有失败的可能性。但是,对于确实发生故障的情况,我们开发了一种混合故障恢复方案,以确保应用程序可以在预定的时间间隔内完成。我们的实验结果表明,与几种基于启发式的贪婪调度算法相比,我们的调度算法可以获得更好的收益,同时开销却可以忽略不计。当我们应用混合故障恢复方案时,收益会进一步提高,成功率将达到100%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号