Software-driven hardware configurations account for the majority of modern complex systems. The often costly failures of such systems can be attributed to software specific, hardware specific, or software/hardware interaction failures. The understanding of the propagation of failures in a complex system is critical because, while a software component may not fail in terms of loss of function, a software operational state can cause an associated hardware failure. The least expensive phase of the product life cycle to address failures is during the design stage. This results in a need to evaluate how a combined software/hardware system behaves and how failures propagate from a design stage analysis framework.Historical approaches to modeling the reliability of these systems have analyzed the software and hardware components separately. As a result significant work has been done to model and analyze the reliability of either component individually. Research into interfacing failures between hardware and software has been largely on the software side in modeling the behavior of software operating on failed hardware.This paper proposes the use of high-level system modeling approaches to model failure propagation in combined software/hardware system. Specifically, this paper presents the use of the Function-Failure Identification and Propagation (FFIP) framework for system level analysis. This framework is applied to evaluate nonlinear failure propagation within the Reaction Control System Jet Selection of the NASA space shuttle, specifically, for the redundancy management system. The redundancy management software is a subset of the larger data processing software and is involved in jet selection, warning systems, and pilot control. The software component that monitors for leaks does so by evaluating temperature data from the fuel and oxidizer injectors and flags a jet as having a failure by leak if the temperature data is out of bounds for three or more cycles.The end goal is to identify the most likely and highest cost paths for fault propagation in a complex system as an effective way to enhance the reliability of a system. Through the defining of functional failure propagation modes and path evaluation, a complex system designer can evaluate the effectiveness of system monitors and comparing design configurations.
展开▼