首页> 外文会议> >Data Placement and Task Scheduling Optimization for Data Intensive Scientific Workflow in Multiple Data Centers Environment
【24h】

Data Placement and Task Scheduling Optimization for Data Intensive Scientific Workflow in Multiple Data Centers Environment

机译:多数据中心环境中数据密集型科学工作流的数据放置和任务调度优化

获取原文

摘要

Running data-intensive scientific workflow across multiple data centers faces massive data transfer problem which leads to low efficiency in actual workflow application for scientists. By considering data size and data dependency, we propose a k-means algorithm based initial data placement strategy that places the most related initial data sets into the same data center at workflow preparation stage. During the execution of scientific workflow, by analyzing interdependent relationship between data sets and tasks, we adopt multilevel task replication strategy to reduce volume of intermediate data transfer. The simulation results show that the proposed strategies can effectively reduce data transfer among data centers and improve performance of running data intensive scientific workflows.
机译:在多个数据中心中运行数据密集型科学工作流面临着巨大的数据传输问题,这导致科学家在实际工作流应用中效率低下。通过考虑数据大小和数据依赖性,我们提出了一种基于k均值算法的初始数据放置策略,该策略在工作流程准备阶段将最相关的初始数据集放置到同一数据中心。在科学工作流程的执行过程中,通过分析数据集与任务之间的相互依存关系,我们采用了多级任务复制策略,以减少中间数据的传输量。仿真结果表明,所提出的策略可以有效减少数据中心之间的数据传输,提高运行数据密集型科学工作流的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号