Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

Claudia Szabo; Quan Z. Sheng; Trent Kroeger; Yihong Zhang; Jian Yu

首页> 外文期刊>Journal of grid computing >Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

【24h】

Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

机译：云中的科学：数据密集型科学工作流的分配和执行

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An important challenge for the adoption of cloud computing in the scientific community remains the efficient allocation and execution of data-intensive scientific workflows to reduce execution time and the size of transferred data. The transferred data overhead is becoming significant with emerging scientific workflows that have input/output files and intermediate data products ranging in the hundreds of gigabytes. The allocation of scientific workflows on public clouds can be described through a variety of perspectives and parameters, and has been proved to be NP-complete. This paper proposes an evolutionary approach for task allocation on public clouds considering data transfer and execution time. In our framework, a solution is represented using an allocation chromosome that encodes the allocation of tasks to nodes, and an ordering chromosome that defines the execution order according to the scientific workflow representation. We propose a multi-objective optimization that relies on a cloud cost model and employs tailored evolution operators. Starting from a population of possible solutions, we employ crossover and mutation operators on both chromosomes aiming at optimizing the data transferred between nodes as well as the total workflow runtime. The crossover operators combine parts of solutions to reduce data overhead, whereas the mutation operators swamp between parts of the same chromosome according to pre-defined rules. Our experimental study compares between the proposed approach and current state-of-the art approaches using synthetic and real-life workflows. Our algorithm performs similarly to existing heuristics for small workflows and shows up to 80 % improvements for larger synthetic workflows. To further validate our approach we compare between the allocation and scheduling obtained by our approach with that obtained by popular scientific workflow managers, when real workflows with hundreds of tasks are executed on a public cloud. The results show a 10 % improvement in runtime over existing schedulers, caused by a 80 % reduction in transferred data and optimized allocation and ordering of tasks. This improved data locality has greater impact as it can be employed to improve and study data provenance and facilitate data persistence for scientific workflows.

机译：在科学界采用云计算的一个重要挑战仍然是有效分配和执行数据密集型科学工作流，以减少执行时间和传输数据的大小。随着新兴的科学工作流具有数百GB的输入/输出文件和中间数据产品，传输的数据开销变得越来越重要。可以通过多种角度和参数描述科学工作流在公共云上的分配，并且已经证明是NP完全的。本文提出了一种考虑数据传输和执行时间的演化方法，用于在公共云上分配任务。在我们的框架中，解决方案是使用分配染色体来表示的，该染色体对任务对节点的分配进行编码，而排序染色体根据科学的工作流表示来定义执行顺序。我们提出了一种基于云成本模型并采用量身定制的演化算子的多目标优化。从大量可能的解决方案开始，我们在两条染色体上均采用了交叉和变异算子，旨在优化节点之间传输的数据以及整个工作流程的运行时间。交叉算子结合了解决方案的各个部分以减少数据开销，而突变算子根据预先定义的规则在同一条染色体的各个部分之间陷入沼泽。我们的实验研究比较了使用合成和现实工作流程的建议方法与当前最先进的方法。对于小型工作流程，我们的算法的性能与现有启发式算法相似，对于大型综合工作流程，其算法最多可提高80％。为了进一步验证我们的方法，当在公共云上执行具有数百个任务的实际工作流时，我们将我们的方法所获得的分配和调度与流行的科学工作流管理器所获得的分配和调度进行了比较。结果表明，由于传输的数据减少了80％，并且优化了任务分配和排序，因此运行时间比现有调度程序提高了10％。这种改进的数据局部性具有更大的影响，因为它可用于改善和研究数据出处并促进科学工作流程的数据持久性。

著录项

来源
《Journal of grid computing》 |2014年第2期|共20页
作者
Claudia Szabo; Quan Z. Sheng; Trent Kroeger; Yihong Zhang; Jian Yu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Data-intensive workflows; Cloud computing; Scheduling; Allocation; Evolutionary computation;

机译：数据密集型工作流;云计算;调度;分配;进化计算;

相似文献

外文文献
中文文献
专利

1. Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows [J] . Claudia Szabo, Quan Z. Sheng, Trent Kroeger, Journal of grid computing . 2014,第2期

机译：云中的科学：数据密集型科学工作流的分配和执行
2. Parameterized specification, configuration and execution of data-intensive scientific workflows [J] . Kumar V.S., Kurc T., Ratnakar V., Cluster computing . 2010,第3期

机译：数据密集型科学工作流程的参数化规范，配置和执行
3. XML Database Support for Distributed Execution of Data-intensive Scientific Workflows [J] . Shannon Hastings, Matheus Ribeiro, Stephen Langella, SIGMOD record . 2005,第3期

机译：XML数据库支持，用于数据密集型科学工作流的分布式执行
4. New Execution Paradigm for Data-Intensive Scientific Workflows [C] . Mahmoud El-Gayyar, Yan Leng, Serge Shumilov, IEEE Congress on Services . 2009

机译：数据密集型科学工作流程的新执行范例
5. Specification, configuration and execution of data-intensive scientific applications. [D] . Kumar, Vijay S. 2010

机译：规范，配置和执行数据密集型科学应用程序。
6. Parameterized Specification Configuration and Execution of Data-Intensive Scientific Workflows [O] . Vijay S. Kumar, Tahsin Kurc, Varun Ratnakar, -1

机译：数据密集型科学工作流程的参数化规范配置和执行
7. Re-provisioning of cloud-based execution infrastructure using the cloud-aware provenance to facilitate scientific workflow execution reproducibility [O] . Khawar H., Munir K., McClatchey R., 2016

机译：使用了解云的出处重新配置基于云的执行基础结构，以促进科学工作流执行的可重复性

Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows

摘要

著录项

相似文献

相关主题

期刊订阅