首页>
外国专利>
Fault tolerance via N-modular software redundancy using indirect instrumentation
Fault tolerance via N-modular software redundancy using indirect instrumentation
展开▼
机译:通过使用间接仪器的N模块软件冗余实现容错
展开▼
页面导航
摘要
著录项
相似文献
摘要
Fault tolerance is provided in a computing system using a technique referred to as indirect instrumentation. In one embodiment, a number of different copies of a given target program are executed on different machines in the system. Each of the machines includes a controller for controlling the execution of the copy of the target program on that machine. The controllers communicate with a user interface of an instrumentation tool on another machine. A user specifies variables to be monitored, breakpoints, voting and recovery parameters and other information using the user interface of the instrumentation tool, and the tool communicates corresponding commands to each of the controllers for use in executing the copies. A fault is detected in one of the copies by comparing values of a user-specified variable generated by the different copies at the designated breakpoints. Upon detection of a fault in a given one of the copies, a checkpoint is taken of another one of the copies that has been determined to be operating properly, and a new copy is restarted from the checkpoint. The use of the controllers allows faults to be detected and appropriate recovery actions to be taken without modification of target program code.
展开▼