首页> 外文期刊>Concurrency, practice and experience >Towards optimized scheduling for data-intensive scientific workflow in multiple datacenter environment
【24h】

Towards optimized scheduling for data-intensive scientific workflow in multiple datacenter environment

机译:为多数据中心环境中的数据密集型科学工作流实现优化调度

获取原文
获取原文并翻译 | 示例
           

摘要

In the big data era, scientific workflow exhibits the characteristics of data intensity and becomesrnincreasingly popular in scientific domains. Efficient scheduling of data-intensive scientific workflow inrna multiple datacenter (DC) environment has been a long-standing challenge. Most of previous work onrndata-intensive scientific workflow scheduling primarily focused on the optimization of reducing the volumesrnof data transfer between workflow tasks. In this paper, novel scheduling strategies for the execution ofrndata-intensive scientific workflow in multi-DC environment are proposed aiming at the optimization of thernoverall data transfer time. A novel DC selection approach is proposed to minimize the number of DCs havingrnenough storage capacity for the execution of scientific workflow as well as optimized inter-DC network bandwidthrnfor efficient data transfer between workflow tasks. A k-means clustering-based data placement strategyrnis adopted to intelligently place the initial data of scientific workflow thereby reducing the volume of initialrndata transfer between different DCs. A multilevel task replication scheduling strategy is invented to reducernthe volumes of intermediate data transfer between DCs during the runtime of the scientific workflow. Simulationsrnspanning a broad range of scientific workflow and multi-DC settings are performed in order to verifyrnthe proposed approaches. The numerical results show that our combined scheduling strategy significantlyrnreduces the overall data transfer time and data transfer volume when scientific workflow is scheduled inrnmulti-DC environment.
机译:在大数据时代,科学工作流具有数据强度的特点,在科学领域越来越受欢迎。在多个数据中心(DC)环境中有效地调度数据密集型科学工作流一直是一项长期的挑战。以前有关数据密集型科学工作流调度的大多数工作主要集中在减少工作流任务之间的volumerno数据传输的优化上。为了优化整个数据传输时间,提出了一种在多DC环境下执行数据密集型科学工作流的新调度策略。提出了一种新颖的DC选择方法,以最小化具有足够存储容量的DC的数量,以用于执行科学工作流程以及优化的DC间网络带宽,从而在工作流程任务之间进行有效的数据传输。采用基于k均值聚类的数据放置策略,智能地放置科学工作流的初始数据,从而减少了不同DC之间初始数据传输的数量。发明了一种多级任务复制调度策略,以减少科学工作流运行期间DC之间的中间数据传输量。为了验证所提出的方法,进行了跨越广泛的科学工作流程和多DC设置的仿真。数值结果表明,当在多DC环境中调度科学工作流时,我们的组合调度策略显着减少了整体数据传输时间和数据传输量。

著录项

  • 来源
    《Concurrency, practice and experience》 |2015年第18期|5606-5622|共17页
  • 作者单位

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;

    School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    scientific workflow; scheduling; multiple datacenter;

    机译:科学的工作流程;排程多数据中心;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号