首页> 外文期刊>International Journal of High Performance Computing and Networking >Error recovery mechanism for grid-based workflow within SLA context
【24h】

Error recovery mechanism for grid-based workflow within SLA context

机译:SLA上下文中基于网格的工作流的错误恢复机制

获取原文
获取原文并翻译 | 示例
           

摘要

Service Level Agreements (SLAs) serve as a foundation for a reliable and predictable job execution at remote grid sites. In this paper, we describe an error recovery mechanism for workflow within the SLA context, coping with catastrophic failure when one or several High Performance Computing Centers (HPCCs) are detached from the grid system. We propose an algorithm to detect all affected sub-jobs when the error happens and an algorithm to remap those sub-jobs to the remaining healthy HPCCs with makespan optimise. The experiment result shows that our mechanism discovers a higher quality solution in a shorter time period than other existing methods.
机译:服务水平协议(SLA)是在远程网格站点上可靠且可预测的作业执行的基础。在本文中,我们描述了一种在SLA上下文中针对工作流的错误恢复机制,以应对一个或多个高性能计算中心(HPCC)从网格系统分离时的灾难性故障。我们提出了一种算法,该算法可在错误发生时检测所有受影响的子作业,并提供一种将这些子作业重新映射到剩余的健康HPCC的算法,并具有makepan优化。实验结果表明,与其他现有方法相比,我们的机制可以在更短的时间内发现更高质量的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号