首页> 外文期刊>International journal of grid and high performance computing >Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems
【24h】

Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems

机译:一种基于新算法的计算机系统容错能力的分析与评估

获取原文
获取原文并翻译 | 示例

摘要

In this paper, the authors present a new approach to algorithm based fault tolerance (ABFT) for High Performance computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of fault, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways, the parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs, can apply convolution codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This paper proposes a novelcomputing paradigm toprovidefault tolerancefor numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.
机译:在本文中,作者提出了一种针对高性能计算系统的基于算法的容错(ABFT)的新方法。基于算法的容错方法将不容忍特定类型的故障的系统(称为容错系统)转换为提供特定级别容错(即恢复)的系统。检测错误的ABFT技术依赖于以两种方式计算的奇偶校验值的比较,输入奇偶校验值的并行处理产生的输出奇偶校验值可与从原始处理后的输出重新生成的奇偶校验值相比较,可以应用卷积码实现冗余。该方法是一种用于容错计算系统中并发纠错的新方法。本文提出了一种新颖的计算范式来为数值算法提供默认容差。作者还介绍,实施和评估ABFT中的早期检测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号