首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks
【24h】

FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks

机译:FT-CNN:卷积神经网络的基于算法的容错

获取原文
获取原文并翻译 | 示例

摘要

Convolutional neural networks (CNNs) are becoming more and more important for solving challenging and critical problems in many fields. CNN inference applications have been deployed in safety-critical systems, which may suffer from soft errors caused by high-energy particles, high temperature, or abnormal voltage. Of critical importance is ensuring the stability of the CNN inference process against soft errors. Traditional fault tolerance methods are not suitable for CNN inference because error-correcting code is unable to protect computational components, instruction duplication techniques incur high overhead, and existing algorithm-based fault tolerance (ABFT) techniques cannot protect all convolution implementations. In this article, we focus on how to protect the CNN inference process against soft errors as efficiently as possible, with the following three contributions. (1) We propose several systematic ABFT schemes based on checksum techniques and analyze their fault protection ability and runtime thoroughly. Unlike traditional ABFT based on matrix-matrix multiplication, our schemes support any convolution implementations. (2) We design a novel workflow integrating all the proposed schemes to obtain a high detection/correction ability with limited total runtime overhead. (3) We perform our evaluation using ImageNet with well-known CNN models including AlexNet, VGG-19, ResNet-18, and YOLOv2. Experimental results demonstrate that our implementation can handle soft errors with very limited runtime overhead (4%similar to 8% in both error-free and error-injected situations).
机译:卷积神经网络(CNNS)对于解决许多领域的挑战和严重问题而言变得越来越重要。 CNN推断应用已部署在安全关键系统中,这可能患有由高能粒子,高温或异常电压引起的软误差。至关重要的重要性是确保CNN推理过程对软错误的稳定性。传统的容错方法不适用于CNN推断,因为纠错码无法保护计算分量,指令复制技术引起高开销,并且现有的基于算法的容错(ABFT)技术无法保护所有卷积实现。在本文中,我们专注于如何尽可能高效地保护CNN推理过程,并有以下三种贡献。 (1)我们提出了基于校验和技术的几种系统ABFT方案,并彻底分析了其故障保护能力和运行时。与基于Matrix-Matrix乘法的传统ABFT不同,我们的方案支持任何卷积实现。 (2)我们设计了一种新的工作流程,整合所有提出的方案,以获得具有有限的总运行时开销的高检测/校正能力。 (3)我们使用具有众所周知的CNN模型的ImageNet进行评估,包括AlexNet,VGG-19,Reset-18和Yolov2。实验结果表明,我们的实现可以处理具有非常有限的运行时开销的软错误(在无差错和错误的情况下与8%相似的4%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号