首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds
【24h】

Fault-Tolerant Scheduling for Real-Time Scientific Workflows with Elastic Resource Provisioning in Virtualized Clouds

机译:利用虚拟化云中的弹性资源供应进行实时科学工作流的容错调度

获取原文
获取原文并翻译 | 示例

摘要

Clouds are becoming an important platform for scientific workflow applications. However, with many nodes being deployed in clouds, managing reliability of resources becomes a critical issue, especially for the real-time scientific workflow execution where deadlines should be satisfied. Therefore, fault tolerance in clouds is extremely essential. The PB (primary backup) based scheduling is a popular technique for fault tolerance and has effectively been used in the cluster and grid computing. However, applying this technique for real-time workflows in a virtualized cloud is much more complicated and has rarely been studied. In this paper, we address this problem. We first establish a real-time workflow fault-tolerant model that extends the traditional PB model by incorporating the cloud characteristics. Based on this model, we develop approaches for task allocation and message transmission to ensure faults can be tolerated during the workflow execution. Finally, we propose a dynamic fault-tolerant scheduling algorithm, FASTER, for real-time workflows in the virtualized cloud. FASTER has three key features: 1) it employs a backward shifting method to make full use of the idle resources and incorporates task overlapping and VM migration for high resource utilization, 2) it applies the vertical/horizontal scaling-up technique to quickly provision resources for a burst of workflows, and 3) it uses the vertical scaling-down scheme to avoid unnecessary and ineffective resource changes due to fluctuated workflow requests. We evaluate our FASTER algorithm with synthetic workflows and workflows collected from the real scientific and business applications and compare it with six baseline algorithms. The experimental results demonstrate that FASTER can effectively improve the resource utilization and schedulability even in the presence of node failures in virtualized clouds.
机译:云正在成为科学工作流程应用程序的重要平台。但是,由于许多节点都部署在云中,因此管理资源的可靠性成为一个关键问题,尤其是对于必须满足期限的实时科学工作流执行而言。因此,云中的容错能力至关重要。基于PB(主备份)的调度是一种流行的容错技术,已有效地用于集群和网格计算中。但是,将这种技术应用于虚拟化云中的实时工作流要复杂得多,并且很少进行研究。在本文中,我们解决了这个问题。我们首先建立一个实时工作流容错模型,该模型通过合并云特征来扩展传统的PB模型。基于此模型,我们开发了任务分配和消息传输方法,以确保在工作流执行期间可以容忍错误。最后,我们为虚拟化云中的实时工作流提出了动态容错调度算法FASTER。 FASTER具有三个关键功能:1)它采用后移方法来充分利用空闲资源,并结合了任务重叠和VM迁移以提高资源利用率; 2)它应用了垂直/水平扩展技术来快速配置资源3)使用垂直缩减方案,以避免由于工作流程请求波动而造成不必要和无效的资源更改。我们使用合成工作流程以及从实际科学和业务应用程序中收集的工作流程评估我们的FASTER算法,并将其与六种基准算法进行比较。实验结果表明,即使在虚拟化云中出现节点故障的情况下,FASTER也可以有效地提高资源利用率和可调度性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号