...
首页> 外文期刊>Journal of supercomputing >Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform
【24h】

Replication and data management-based workflow scheduling algorithm for multi-cloud data centre platform

机译:多云数据中心平台的复制和基于数据管理的工作流程调度算法

获取原文
获取原文并翻译 | 示例
           

摘要

Scientific workflow applications have a large amount of tasks and data sets to be processed in a systematic manner. These applications benefit from cloud computing platform that offer access to virtually limitless resources provisioned elastically and on demand. Running data-intensive scientific workflow on geographically distributed data centres faces massive amount of data transfer. That affects the whole execution time and monitory cost of scientific workflows. The existing efforts on scheduling workflow concentrate on decreasing make span and budget; little concern has been paid to contemplate tasks and data sets dependency. In this paper, we introduced workflow scheduling technique to overcome data transfer and execute workflow tasks within deadline and budget constraints. The proposed techniques consist of initial data placement stage, which clusters and distributes datasets based on their dependence and replication-based partial critical path (R-PCP) technique which schedules tasks with data locality and dynamically maintains dependency matrix for the placement of generated data sets. To reduce run time datasets movement, we use interdata centre tasks replication and data sets replication to make sure data sets availability. Simulation results with four workflow applications illustrate that our strategy efficiently reduces data movement and executes all chosen workflows within user specified budget and deadline. Results reveal that R-PCP has 44.93% and 31.37% less data movement compared to random and adaptive data-aware scheduling (ADAS) techniques, respectively. R-PCP has 26.48% less energy consumption compared with ADAS technique.
机译:科学工作流应用程序具有大量任务和数据集以以系统的方式处理。这些应用程序受益于云计算平台,可提供对物质无限的资源提供弹性和按需提供的。在地理分布式数据中心上运行数据密集型科学工作流程面临大量的数据传输。影响科学工作流的整个执行时间和监测成本。现行调度工作流程专注于减少跨度和预算的努力;考虑任务和数据集依赖性,已经支付了很少的关注。在本文中,我们介绍了工作流程调度技术,以克服数据传输并在截止日期和预算约束中执行工作流任务。所提出的技术由初始数据放置级组成,其基于其依赖性和基于复制的部分关键路径(R-PCP)技术群集和分发数据集,该技术将任务与数据局部度调度并动态地维护所生成的数据集的放置依赖性矩阵。要减少运行时数据集移动,我们使用interdata中心任务复制和数据集复制以确保数据设置可用性。具有四个工作流程应用的仿真结果表明,我们的策略有效地减少了数据移动,并在用户指定的预算和截止日内执行所有所选工作流程。结果表明,与随机和自适应数据感知调度(ADAS)技术相比,R-PCP分别与随机和自适应数据感知调度(ADAS)技术相比具有44.93%和31.37%。与ADAS技术相比,R-PCP减少少26.48%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号