首页> 外文会议>Scientific and statistical database management >Improving Workflow Fault Tolerance through Provenance-Based Recovery
【24h】

Improving Workflow Fault Tolerance through Provenance-Based Recovery

机译:通过基于源的恢复提高工作流程的容错能力

获取原文
获取原文并翻译 | 示例

摘要

Scientific workflow systems frequently are used to execute a variety of long-running computational pipelines prone to premature termination due to network failures, server outages, and other faults. Researchers have presented approaches for providing fault tolerance for portions of specific workflows, but no solution handles faults that terminate the workflow engine itself when executing a mix of stateless and stateful workflow components. Here we present a general framework for efficiently resuming workflow execution using information commonly captured by workflow systems to record data provenance. Our approach facilitates fast workflow replay using only such commonly recorded provenance data. We also propose a checkpoint extension to standard provenance models to significantly reduce the computation needed to reset the workflow to a consistent state, thus resulting in much shorter re-execution times. Our work generalizes the rescue-DAG approach used by DAGMan to richer workflow models that may contain stateless and stateful multi-invocation actors as well as workflow loops.
机译:科学的工作流系统经常用于执行各种长时间运行的计算管道,这些管道由于网络故障,服务器中断和其他故障而容易过早终止。研究人员已经提出了为特定工作流的某些部分提供容错能力的方法,但是当执行无状态和有状态工作流组件的混合时,没有解决方案能够处理终止工作流引擎本身的错误。在这里,我们提出了一个通用框架,可使用工作流系统通常捕获的信息来有效恢复工作流执行,以记录数据出处。我们的方法仅使用这种通常记录的出处数据即可促进快速的工作流重播。我们还建议将检查点扩展到标准出处模型,以显着减少将工作流重置为一致状态所需的计算,从而缩短重新执行时间。我们的工作将DAGMan使用的抢救DAG方法推广到更丰富的工作流模型,该模型可能包含无状态和有状态的多调用主体以及工作流循环。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号