Techniques for dealing with hardware failures in very large networks of distributed processing elements are presented. A concept known as distributed fault-tolerance is introduced. A model of a large multiprocessor system is developed and techniques, based on this model, are given by which each processing element can correctly diagnose failures in all other processing elements in the system. The effect of varying system interconnection structures upon the extent and efficiency of the diagnosis process is discussed, and illustrated with an example of an actual system.
Finally, extensions to the model, which render it more realistic, are given and a modified version of the diagnosis procedure is presented which operates under this model.
最后,给出了对该模型的扩展,使其更加逼真,并提出了在该模型下运行的诊断程序的修改版本。 P>
机译:通过调度硬实时多处理器系统中的非周期性任务来实现容错
机译:多处理器数字交换系统中的容错
机译:用于为分布式计算机系统选择平衡的容错技术组的方法
机译:分布式数据流系统上迭代图处理的高效容错
机译:实时多处理器操作系统的容错。
机译:通过在具有多处理器的嵌入式系统上执行并行计算来实现混沌密码系统
机译:多处理器实时系统的容错模型
机译:分布式多处理器实时系统的容错性