首页> 外文会议>Annual symposium on Computer Architecture;Symposium on Computer Architecture >Distributed fault-tolerance for large multiprocessor systems
【24h】

Distributed fault-tolerance for large multiprocessor systems

机译:大型多处理器系统的分布式容错

获取原文
获取外文期刊封面目录资料

摘要

Techniques for dealing with hardware failures in very large networks of distributed processing elements are presented. A concept known as distributed fault-tolerance is introduced. A model of a large multiprocessor system is developed and techniques, based on this model, are given by which each processing element can correctly diagnose failures in all other processing elements in the system. The effect of varying system interconnection structures upon the extent and efficiency of the diagnosis process is discussed, and illustrated with an example of an actual system.

Finally, extensions to the model, which render it more realistic, are given and a modified version of the diagnosis procedure is presented which operates under this model.

机译:提出了在大型分布式处理元件网络中处理硬件故障的技术。引入了一种称为分布式容错的概念。开发了大型多处理器系统的模型,并基于该模型给出了一些技术,通过这些技术,每个处理元件都可以正确诊断系统中所有其他处理元件的故障。讨论了各种系统互连结构对诊断过程的程度和效率的影响,并以一个实际系统为例进行了说明。

最后,给出了对该模型的扩展,使其更加逼真,并提出了在该模型下运行的诊断程序的修改版本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号