Failure analysis of distributed scientific workflows executing in the cloud

机译：在云中执行的分布式科学工作流的故障分析

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work presents models characterizing failures observed during the execution of large scientific applications on Amazon EC2. Scientific workflows are used as the underlying abstraction for application representations. As scientific workflows scale to hundreds of thousands of distinct tasks, failures due to software and hardware faults become increasingly common. We study job failure models for data collected from 4 scientific applications, by our Stampede framework. In particular, we show that a Naive Bayes classifier can accurately predict the failure probability of jobs. The models allow us to predict job failures for a given execution resource and then use these failure predictions for two higher-level goals: (1) to suggest a better job assignment, and (2) to provide quantitative feedback to the workflow component developer about the robustness of their application codes.

机译：这项工作提出了表征在Amazon EC2上执行大型科学应用程序期间观察到的故障的模型。科学的工作流程被用作应用程序表示的基础抽象。随着科学工作流扩展到成千上万的不同任务，由于软件和硬件故障而导致的故障变得越来越普遍。我们通过Stampede框架研究从4种科学应用程序收集的数据的工作失败模型。特别是，我们证明了朴素贝叶斯分类器可以准确地预测作业的失败概率。这些模型使我们能够预测给定执行资源的作业失败，然后将这些失败预测用于两个更高级别的目标：（1）建议更好的作业分配；（2）向工作流组件开发人员提供有关以下方面的定量反馈：应用程序代码的健壮性。

著录项

来源
《2012 8th International Conference on Network and Service Management.》|2012年|p.46- 54|共9页
会议地点 Las Vegas NV(US);Las Vegas NV(US)
作者
Samak Taghrid; Gunter Dan; Goode Monte; Deelman Ewa; Juve Gideon; Silva Fabio; Vahi Karan;
展开▼
作者单位

Lawrence Berkeley National Laboratory, CA, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;计算机网络;
关键词

相似文献

外文文献
中文文献
专利

1. Bioinformatics recipes: creating, executing and distributing reproducible data analysis workflows [J] . Natay Aberra, Aswathy Sebastian, Aaron P. Maloy, BMC Bioinformatics . 2020,第1期

机译：生物信息学配方：创建，执行和分发可重复的数据分析工作流程
2. A Declarative Optimization Engine for Resource Provisioning of Scientific Workflows in Geo-Distributed Clouds [J] . Amelie Chi Zhou, Bingsheng He, Xuntao Cheng, IEEE Transactions on Parallel and Distributed Systems . 2017,第3期

机译：声明式优化引擎，用于地理分布云中的科学工作流资源配置
3. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud [J] . Abraham Nieva de la Hidalga, Alan Williams, Aleksandra Nenadic, Nucleic acids research . 2013,第W1期

机译：Taverna工作流套件：在桌面，Web或云中设计和执行Web服务的工作流
4. Failure analysis of distributed scientific workflows executing in the cloud [C] . Samak Taghrid, Gunter Dan, Goode Monte, International Conference on Network and Service Management . 2012

机译：在云中执行分布式科学工作流程的故障分析
5. Efficient scientific workflow scheduling in cloud environment. [D] . Cao, Fei. 2014

机译：在云环境中进行高效的科学工作流程调度。
6. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop web or in the cloud [O] . Katherine Wolstencroft, Robert Haines, Donal Fellows, 2013

机译：Taverna工作流套件：在桌面Web或云中设计和执行Web服务的工作流
7. Executing Large Scale Scientific Workflows in Public Clouds [O] . Jiang Qingye 2015

机译：在公共云中执行大规模科学工作流程

Failure analysis of distributed scientific workflows executing in the cloud

摘要

著录项

相似文献

相关主题

期刊订阅