首页> 外文会议>IEEE International Symposium on Cluster Computing and the Grid >A resource manager for optimal resource selection and fault tolerance service in Grids
【24h】

A resource manager for optimal resource selection and fault tolerance service in Grids

机译:用于Grids中最佳资源选择和容错服务的资源管理器

获取原文

摘要

In this paper, we address the issues of resource management and fault tolerance in Grids. In Grids, the state of the selected resources for job execution is a primary factor that determines the computing performance. Specifically, we propose a resource manager for optimal resource selection. The resource manager automatically selects the optimal resources among candidate resources using a genetic algorithm. Typically, the probability of failure is higher in Grid computing than in a traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in computational Grids and Grid services are often expected to meet some minimum levels of quality of service (QoS) for desirable operation. To address this issue, we also propose fault tolerance service to satisfy QoS requirements. We extend the definition of failures, such as process failure, processor failure, and network failure, and design the fault detector and fault manager. The simulation results indicate that our approaches are promising in that (1) our resource manager finds the optimal set of resources that guarantees the optimal performance; (2) the fault detector detects the occurrence of resource failures; and (3) the fault manager guarantees that the submitted jobs complete and improves the performance of job execution due to job migration even if some failures happen.
机译:在本文中,我们解决了网格中资源管理和容错的问题。在网格中,作业执行所选资源的状态是确定计算性能的主要因素。具体地,我们提出了一个资源管理器以获得最佳资源选择。资源管理器使用遗传算法自动在候选资源中选择最佳资源。通常,网格计算中失败的概率高于传统的并行计算,并且资源的失败会致死地影响作业。因此,在计算网格中,容错服务是必不可少的,并且通常期望网格服务,以满足期望操作的一些最小的服务质量(QoS)。为解决此问题,我们还提出了容错服务以满足QoS要求。我们扩展了故障的定义,例如工艺故障,处理器故障和网络故障,以及设计故障检测器和故障管理器。仿真结果表明,我们的方法在这一点(1)我们的资源经理找到最佳的资源集,保证了最佳性能; (2)故障检测器检测资源故障的发生; (3)故障管理器保证提交的作业填写并提高了作业迁移的作业执行的性能,即使某些故障发生。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号