首页> 外文会议>Fault-Tolerant Computing, 1998. Digest of Papers. Twenty-Eighth Annual International Symposium on >Validation of the fault/error handling mechanisms of the Teraflops supercomputer
【24h】

Validation of the fault/error handling mechanisms of the Teraflops supercomputer

机译:验证Teraflops超级计算机的故障/错误处理机制

获取原文

摘要

The Teraflops system, the world's most powerful supercomputer, was developed by Intel Corporation for the US Department of Energy (DOE) as part of the Accelerated Strategic Computing Initiative (ASCI). The machine contains more than 9000 Intel Pentium (R) Pro processors and performs over one trillion floating point operations per second. Complex hardware and software mechanisms were devised for complying with DOE's reliability requirements. This paper gives a brief description of the Teraflops system architecture and presents the validation of the fault/error handling mechanisms. The validation process was based on an enhanced version of the physical fault injection at the IC pin level. An original approach was developed for assessing signal sensitivity to transient faults and the effectiveness of the fault tolerance mechanisms. Several malfunctions were unveiled by the fault injection experiments. After corrective actions had been undertaken, the supercomputer performed according to the specification.
机译:Teraflops系统是世界上功能最强大的超级计算机,它是英特尔公司为美国能源部(DOE)开发的,是加速战略计算计划(ASCI)的一部分。该机器包含9000多个Intel Pentium(R)Pro处理器,每秒执行超过1万亿个浮点运算。设计了复杂的硬件和软件机制来符合DOE的可靠性要求。本文简要介绍了Teraflops系统架构,并提出了故障/错误处理机制的验证。验证过程基于IC引脚级别的物理故障注入的增强版本。开发了一种原始方法来评估信号对瞬态故障的敏感性以及容错机制的有效性。故障注入实验揭示了一些故障。采取纠正措施后,超级计算机将按照规范执行操作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号