首页> 外文会议>10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing >The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems
【24h】

The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems

机译:故障跟踪档案:使比较分析分布式系统中的故障

获取原文
获取原文并翻译 | 示例

摘要

With the increasing functionality and complexity of distributed systems, resource failures are inevitable. While numerous models and algorithms for dealing with failures exist, the lack of public trace data sets and tools has prevented meaningful comparisons. To facilitate the design, validation, and comparison of fault-tolerant models and algorithms, we have created the Failure Trace Archive (FTA) as an online public repository of availability traces taken from diverse parallel and distributed systems. Our main contributions in this study are the following. First, we describe the design of the archive, in particular the rationale of the standard FTA format, and the design of a toolbox that facilitates automated analysis of trace data sets. Second, applying the toolbox, we present a uniform comparative analysis with statistics and models of failures in nine distributed systems. Third, we show how different interpretations of these data sets can result in different conclusions. This emphasizes the critical need for the public availability of trace data and methods for their analysis.
机译:随着分布式系统功能的增加和复杂性的增加,资源故障不可避免。尽管存在许多用于处理故障的模型和算法,但是缺少公共跟踪数据集和工具阻碍了有意义的比较。为了方便容错模型和算法的设计,验证和比较,我们创建了故障跟踪档案库(FTA),作为从各种并行和分布式系统获取的可用性跟踪的在线公共存储库。我们在这项研究中的主要贡献如下。首先,我们描述档案的设计,特别是标准FTA格式的原理,以及有助于自动化分析跟踪数据集的工具箱的设计。其次,应用工具箱,我们对9个分布式系统中的故障统计数据和模型进行了统一的比较分析。第三,我们展示了这些数据集的不同解释如何导致不同的结论。这强调了公开获取跟踪数据及其分析方法的迫切需求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号