首页> 外文会议>International Conference on Grid and Cooperative Computing; 20070816-18; Urumchi(CN) >Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point
【24h】

Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point

机译:基于检查点的网格任务调度的容错遗传算法

获取原文
获取原文并翻译 | 示例

摘要

One motivation of Grid computing is to aggregate the power of widely distributed resources, and provide non-trivial services to users. To achieve this goal, an efficient Grid fault tolerance system is an essential part of the grid. Rather than covering the whole Grid fault tolerance area, this survey provides a review of the subject mainly from theperspective of check point. In this review the challenges for fault tolerance are identified. In Grid environments, execution failures can occur for various reasons such as network failure, overloaded resource conditions, or non-availability of required software components.Thus, fault-tolerant systems should be able to identifyand handle failures and support reliable execution in the presence of concurrency and failures In scheduling a large number of user jobs for parallel execution on an open-resource Grid system, the jobs are subject to system failures or delays caused by infected hardware, software vulnerability, and distrusted security policy. In this paper we propose a task level fault tolerance. Task-level techniques mask the effects of the execution failure of tasks. Four task level techniques are Retry, alternate resource, Check point and Replication. Check point technique strategy achieves optimal load balance across different grid sites These Fault tolerance task level techniques can upgrade grid performance significantly at only a moderate in extra resources or scheduling delays in a risky Grid computing environment.
机译:网格计算的一种动机是聚集广泛分布的资源的力量,并为用户提供非平凡的服务。为了实现这一目标,有效的电网容错系统是电网的重要组成部分。该调查不是覆盖整个电网容错范围,而是主要从检查点的角度对主题进行审查。在这篇综述中,确定了容错方面的挑战。在Grid环境中,执行失败可能是由于各种原因而发生的,例如网络失败,资源状况过载或所需软件组件不可用。因此,容错系统应该能够识别和处理失败并在存在以下情况时支持可靠的执行并发和故障在调度大量用户作业以在开放资源网格系统上并行执行时,这些作业会受到系统故障或由受感染的硬件,软件漏洞和不受信任的安全策略引起的延迟的影响。在本文中,我们提出了任务级别的容错能力。任务级技术掩盖了任务执行失败的影响。四种任务级别技术是重试,备用资源,检查点和复制。检查点技术策略可在不同网格站点之间实现最佳负载平衡。这些容错任务级别技术仅在中等风险的网格计算环境中以适度的额外资源或调度延迟显着提升网格性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号