首页> 美国卫生研究院文献>other >An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems
【2h】

An improved ant colony optimization algorithm with fault tolerance for job scheduling in grid computing systems

机译:网格计算系统中一种具有容错能力的蚁群优化算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The Grid scheduler, schedules user jobs on the best available resource in terms of resource characteristics by optimizing job execution time. Resource failure in Grid is no longer an exception but a regular occurring event as resources are increasingly being used by the scientific community to solve computationally intensive problems which typically run for days or even months. It is therefore absolutely essential that these long-running applications are able to tolerate failures and avoid re-computations from scratch after resource failure has occurred, to satisfy the user’s Quality of Service (QoS) requirement. Job Scheduling with Fault Tolerance in Grid Computing using Ant Colony Optimization is proposed to ensure that jobs are executed successfully even when resource failure has occurred. The technique employed in this paper, is the use of resource failure rate, as well as checkpoint-based roll back recovery strategy. Check-pointing aims at reducing the amount of work that is lost upon failure of the system by immediately saving the state of the system. A comparison of the proposed approach with an existing Ant Colony Optimization (ACO) algorithm is discussed. The experimental results of the implemented Fault Tolerance scheduling algorithm show that there is an improvement in the user’s QoS requirement over the existing ACO algorithm, which has no fault tolerance integrated in it. The performance evaluation of the two algorithms was measured in terms of the three main scheduling performance metrics: makespan, throughput and average turnaround time.
机译:网格调度程序通过优化作业执行时间,根据资源特征在最佳可用资源上调度用户作业。网格中的资源故障不再是一个例外,而是一个经常发生的事件,因为科学界越来越多地使用资源来解决计算密集型问题,这些问题通常持续数天甚至数月。因此,这些长期运行的应用程序必须能够容忍故障,并避免在发生资源故障后避免重新计算,从而满足用户的服务质量(QoS)要求。提出了使用蚁群优化的网格计算中具有容错能力的作业调度,以确保即使发生资源故障也能成功执行作业。本文采用的技术是资源故障率的使用以及基于检查点的回滚恢复策略。检查点旨在通过立即保存系统状态来减少因系统故障而丢失的工作量。讨论了所提出的方法与现有蚁群优化(ACO)算法的比较。实施的容错调度算法的实验结果表明,与现有的ACO算法相比,用户的QoS要求有了改进,而现有的ACO算法中没有集成容错功能。两种算法的性能评估是根据三个主要的调度性能指标来衡量的:有效期,吞吐量和平均周转时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号