...
首页> 外文期刊>Concurrency and computation: practice and experience >A multilevel fault model for integrated parallel fault-tolerant systems
【24h】

A multilevel fault model for integrated parallel fault-tolerant systems

机译:集成并行容错系统的多级故障模型

获取原文
获取原文并翻译 | 示例
           

摘要

The appearance of multithreaded, multicore, and manycore systems has led to a performance leap. Such systems are denoted as integrated, when there are electrical and physical dependencies between different functional units, that is, multiple cores integrated on a single die. Typically, such systems have a common, shared interface to the outside world, bearing the potential of a single point of failure. In this work, several questions concerning fault propagation shall be tackled. First, if one component within a core fails, how likely is a faulty behavior of other components on the same or other cores? Second, what is the overall reliability of such a system? It is important to answer these questions prior to an implementation, because the total costs of a reliable product shall be as small as possible. Our approach combines different abstraction levels in one multilevel fault model. The first stage is the physical level, covering the physical effects of a fault. Validation on this level can be omitted, if the modeling is precise enough. The second stage is a component and routing model where current is represented as logic value. The last level is the behavioral modeling of components by finite state machines. Because of the different number and nature of existing parallel systems, a theoretical approach is followed. The model can cover the whole range of parallel devices from field programmable gate arrays to multicore CPUs and manycore graphics processing units. Therefore, it can help to improve the reliability of current and future parallel fault-tolerant systems by identifying the underlying bottlenecks. The function of the model is exemplarily shown by applying it to a field programmable gate array, identifying switchboxes as the main reliability bottleneck.
机译:多线程,多核和多核系统的出现导致性能飞跃。当不同功能单元之间存在电气和物理相关性时,即在单个管芯上集成了多个内核时,此类系统称为集成。通常,此类系统具有与外界的公共共享接口,具有单点故障的可能性。在这项工作中,应解决有关故障传播的几个问题。首先,如果核心中的一个组件发生故障,同一核心或其他核心上其他组件的错误行为发生的可能性有多大?第二,这种系统的整体可靠性如何?在实施之前必须回答这些问题,因为可靠产品的总成本应尽可能小。我们的方法在一个多级故障模型中组合了不同的抽象级别。第一阶段是物理级别,涵盖故障的物理影响。如果建模足够精确,则可以省略此级别的验证。第二阶段是组件和路由模型,其中电流表示为逻辑值。最后一级是通过有限状态机对组件进行行为建模。由于现有并行系统的数量和性质不同,因此采用了一种理论方法。该模型可以覆盖从现场可编程门阵列到多核CPU和多核图形处理单元的整个并行设备范围。因此,它可以通过识别潜在的瓶颈来帮助提高当前和将来的并行容错系统的可靠性。该模型的功能通过将其应用于现场可编程门阵列并确定开关箱为主要可靠性瓶颈而得到示例性展示。

著录项

  • 来源
  • 作者

    Bernhard Fechner;

  • 作者单位

    Department of Systems and Networking, University of Augsburg, Universitdtsstr. 6a, 86159 Augsburg, Germany,Bernhard Fechner, Department of Systems and Networking, University of Augsburg, Universitatsstr.6a, 86159 Augsburg, Germany;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号