...
首页> 外文期刊>Journal of grid computing >Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows
【24h】

Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

机译:云中的科学:数据密集型科学工作流的分配和执行

获取原文
获取原文并翻译 | 示例
           

摘要

An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows.
机译:在科学界采用云计算的一个重要挑战仍然是有效分配和执行数据密集型科学工作流,以减少执行时间和传输数据的大小。随着新兴的科学工作流具有数百GB的输入/输出文件和中间数据产品,传输的数据开销变得越来越重要。可以通过多种角度和参数描述科学工作流在公共云上的分配,并且已经证明是NP完全的。本文提出了一种考虑数据传输和执行时间的演化方法,用于在公共云上分配任务。在我们的框架中,解决方案是使用分配染色体来表示的,该染色体对任务对节点的分配进行编码,而排序染色体根据科学的工作流表示来定义执行顺序。我们提出了一种基于云成本模型并采用量身定制的演化算子的​​多目标优化。从大量可能的解决方案开始,我们在两条染色体上均采用了交叉和变异算子,旨在优化节点之间传输的数据以及整个工作流程的运行时间。交叉算子结合了解决方案的各个部分以减少数据开销,而突变算子根据预先定义的规则在同一条染色体的各个部分之间陷入沼泽。我们的实验研究比较了使用合成和现实工作流程的建议方法与当前最先进的方法。对于小型工作流程,我们的算法的性能与现有启发式算法相似,对于大型综合工作流程,其算法最多可提高80%。为了进一步验证我们的方法,当在公共云上执行具有数百个任务的实际工作流时,我们将我们的方法所获得的分配和调度与流行的科学工作流管理器所获得的分配和调度进行了比较。结果表明,由于传输的数据减少了80%,并且优化了任务分配和排序,因此运行时间比现有调度程序提高了10%。这种改进的数据局部性具有更大的影响,因为它可用于改善和研究数据出处并促进科学工作流程的数据持久性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号