首页> 外文期刊>IEEE Transactions on Computers >Algorithm-based fault tolerance on a hypercube multiprocessor
【24h】

Algorithm-based fault tolerance on a hypercube multiprocessor

机译:超立方体多处理器上基于算法的容错能力

获取原文
获取原文并翻译 | 示例

摘要

The design of fault-tolerant hypercube multiprocessor architecture is discussed. The authors propose the detection and location of faulty processors concurrently with the actual execution of parallel applications on the hypercube using a novel scheme of algorithm-based error detection. System-level error detection mechanisms have been implemented for three parallel applications on a 16-processor Intel iPSC hypercube multiprocessor: matrix multiplication, Gaussian elimination, and fast Fourier transform. Schemes for other applications are under development. Extensive studies have been done of error coverage of the system-level error detection schemes in the presence of finite-precision arithmetic, which affects the system-level encodings. Two reconfiguration schemes are proposed that allow the authors to isolate and replace faulty processors with spare processors.
机译:讨论了容错超立方体多处理器体系结构的设计。作者提出了一种使用基于算法的错误检测新方案,在超立方体上并行执行应用程序的同时,对故障处理器进行检测和定位。系统级错误检测机制已针对16处理器Intel iPSC超立方体多处理器上的三个并行应用程序实现:矩阵乘法,高斯消除和快速傅里叶变换。其他应用的方案正在开发中。在存在影响系统级编码的有限精度算法的情况下,已经对系统级错误检测方案的错误覆盖率进行了广泛的研究。提出了两种重新配置方案,允许作者隔离故障处理器并用备用处理器替换。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号