首页> 外文OA文献 >A cost-effective strategy for intermediate data storage in scientific cloud workflow systems
【2h】

A cost-effective strategy for intermediate data storage in scientific cloud workflow systems

机译:科学云工作流系统中中间数据存储的经济高效策略

摘要

Many scientific workflows are data intensive where a large volume of intermediate data is generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an Intermediate data Dependency Graph (IDG) from the data provenances in scientific workflows. Based on the IDG, we develop a novel intermediate data storage strategy that can reduce the cost of the scientific cloud workflow system by automatically storing the most appropriate intermediate datasets in the cloud storage. We utilise Amazon's cost model and apply the strategy to an astrophysics pulsar searching scientific workflow for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly.
机译:许多科学工作流程都是数据密集型的,其中在执行过程中会生成大量中间数据。需要存储一些有价值的中间数据以进行共享或重用。传统上,它们是根据手动确定的系统存储容量有选择地存储的。随着如今在云上进行科学运算变得流行,可以基于按使用付费模型将更多中间数据存储在科学云工作流程中。在本文中,我们根据科学工作流中的数据来源建立了中间数据依赖图(IDG)。基于IDG,我们开发了一种新颖的中间数据存储策略,该策略可以通过自动将最合适的中间数据集存储在云存储中来降低科学云工作流程系统的成本。我们利用亚马逊的成本模型,将该策略应用于天体脉冲星搜索科学工作流程以进行评估。结果表明,我们的策略可以显着降低科学云工作流执行的总体成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号