首页> 外文期刊>Procedia Computer Science >Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications
【24h】

Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications

机译:通过执行环境规范促进科学工作流程的可重复性

获取原文
       

摘要

Scientific workflows are designed to solve complex scientific problems and accelerate scientific progress. Ideally, scientific workflows should improve the reproducibility of scientific applications by making it easier to share and reuse workflows between scientists. However, scientists often find it difficult to reuse others’ workflows, which is known as workflow decay . In this paper, we explore the challenges in reproducing scientific workflows, and propose a framework for facilitating the reproducibility of scientific workflows at the task level by giving scientists complete control over the execution environments of the tasks in their workflows and integrating execution environment specifications into scientific workflow systems. Our framework allows dependencies to be archived in basic units of OS image, software and data instead of gigantic all-in-one images. We implement a prototype of our framework by integrating Umbrella , an execution environment creator, into Makeflow , a scientific workflow system. To evaluate our framework, we use it to run two bioinformatics scientific workflows, BLAST and BWA . The execution environment of the tasks in each workflow is specified as an Umbrella specification file, and sent to execution nodes where Umbrella is used to create the specified environment for running the tasks. For each workflow we evaluate the size of the Umbrella specification file, the time and space overheads of creating execution environments using Umbrella , and the heterogeneity of execution nodes contributing to each workflow. The evaluation results show that our framework improves the utilization of heterogeneous computing resources, and improves the portability and reproducibility of scientific workflows.
机译:科学工作流旨在解决复杂的科学问题并加速科学进步。理想情况下,科学工作流程应通过简化科学家之间的共享和重用工作流程来提高科学应用程序的可重复性。但是,科学家经常发现很难重用他人的工作流程,这被称为工作流程衰减。在本文中,我们探索了再现科学工作流的挑战,并提出了一个框架,通过让科学家​​完全控制工作流中任务的执行环境并将执行环境规范集成到科学中,来促进任务级科学工作流的可重复性。工作流程系统。我们的框架允许将依赖性存储在OS映像,软件和数据的基本单元中,而不是庞大的多合一映像。通过将执行环境创建者Umbrella集成到科学的工作流程系统Makeflow中,我们实现了框架的原型。为了评估我们的框架,我们使用它来运行两个生物信息学科学工作流程BLAST和BWA。每个工作流中任务的执行环境都指定为Umbrella规范文件,并发送到执行节点,Umbrella用于创建执行任务的指定环境。对于每个工作流程,我们评估Umbrella规范文件的大小,使用Umbrella创建执行环境的时间和空间开销,以及对每个工作流程都有贡献的执行节点的异构性。评估结果表明,我们的框架提高了异构计算资源的利用率,并提高了科学工作流程的可移植性和可重复性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号