首页> 外文期刊>IEEE Transactions on Reliability >Evaluation of Software-Implemented Fault-Tolerance (SIFT) Approach in Gracefully Degradable Multi-Computer Systems
【24h】

Evaluation of Software-Implemented Fault-Tolerance (SIFT) Approach in Gracefully Degradable Multi-Computer Systems

机译:优雅可降解的多计算机系统中软件实现的容错(SIFT)方法的评估

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an analytical method for evaluating the reliability improvement for any size of multi-computer system based on Software-Implemented Fault-Tolerance (SIFT). The method is based on the equivalent failure rate$Gamma$, the single node failure rate$lambda$, the number of nodes in the system, N, the repair rate$mu$, the fault coverage factor$c$, the reconfiguration rate$delta$, and the percentage of blocking faults$b_1$and$b_2$. The impact of these parameters on the reliability improvement has been evaluated for a gracefully degradable multi-computer system using our proposed analytical technique based on Markov chains. To validate our approach, we used the SIFT method which implements error detection at the node level, combined with a fast reconfiguration algorithm for avoiding faulty nodes. It is worth noting that the proposed method is applicable to any multi-computer systems' topology. The evaluation work presented in this paper focuses on the combination of analytical and experimental approaches, and more precisely on Markov chains. The SIFT method has been successfully implemented for a multi-computer system, nCube. The time overhead (reconfiguration & recomputation time) incurred by the injected fault, and the fault coverage factor$c$, are experimentally evaluated by means of a parallel version of the Software Object-Oriented Fault-Injection Tool (nSOFIT). The implemented SIFT approach can be used for real-time applications, when the time constraints should be met despite failures in the gracefully degradable multi-computer system.
机译:本文提出了一种基于软件实现的容错(SIFT)的评估任何规模的多计算机系统可靠性改进的分析方法。该方法基于等效故障率$ Gamma $,单节点故障率$ lambda $,系统中的节点数N,修复率$ mu $,故障覆盖率$ c $,重配置率$ delta $,以及阻塞故障$ b_1 $和$ b_2 $的百分比。使用我们提出的基于马尔可夫链的分析技术,已经评估了这些参数对可靠性改进的影响,这是一个可平稳降解的多计算机系统。为了验证我们的方法,我们使用了SIFT方法,该方法在节点级别实现错误检测,并结合快速重新配置算法来避免出现故障的节点。值得注意的是,所提出的方法适用于任何多计算机系统的拓扑。本文提出的评估工作侧重于分析方法和实验方法的结合,更确切地说是马尔可夫链。 SIFT方法已成功地用于多计算机系统nCube。通过并行版本的面向对象的故障注入工具(nSOFIT),通过实验评估了注入的故障所引起的时间开销(重新配置和重新计算时间)以及故障覆盖因子$ c $。当尽管可优雅降级的多计算机系统出现故障时仍应满足时间限制时,已实现的SIFT方法可用于实时应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号