首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Reliability Evaluation and Analysis of FPGA-Based Neural Network Acceleration System
【24h】

Reliability Evaluation and Analysis of FPGA-Based Neural Network Acceleration System

机译:基于FPGA的神经网络加速系统的可靠性评估与分析

获取原文
获取原文并翻译 | 示例

摘要

Prior works typically conducted the fault analysis of neural network accelerator computing arrays with simulation and focused on the prediction accuracy loss of the neural network models. There is still a lack of systematic fault analysis of the neural network acceleration system that considers both the accuracy degradation and system exceptions, such as system stall and running overtime. To that end, we implemented a representative neural network accelerator and corresponding fault injection modules on a Xilinx ARM-FPGA platform and evaluated the reliability of the system under different fault injection rates when a series of typical neural network models are deployed on the neural network acceleration system. The entire fault injection and reliability evaluation system is open-sourced on GitHub. With comprehensive experiments on the system, we identify the system exceptions based on the various abnormal behaviors of the FPGA-based neural network acceleration system and analyze the underlying reasons. Particularly, we find that the probability of the system exceptions dominates the reliability of the system. The faults also incur accuracy degradation of the neural network models, but the influence depends on the applications of the models and can vary greatly. In addition, we also evaluated the use of conventional triple modular redundancy (TMR) and demonstrated the challenge of TMR with both experiments and analytical models, which may shed light on the reliability design of the FPGA-based neural network acceleration system.
机译:先前作品通常,具有模拟的神经网络加速器计算阵列的故障分析,并专注于神经网络模型的预测精度损失。仍然缺乏对神经网络加速系统的系统性故障分析,其考虑精度劣化和系统异常,例如系统摊位和运行加班。为此,我们在Xilinx ARM-FPGA平台上实施了代表性的神经网络加速器和相应的故障注射模块,并在神经网络加速器上部署了一系列典型的神经网络模型时,评估了不同故障注射率下系统的可靠性系统。整个故障注入和可靠性评估系统在GitHub上开放。通过对系统的综合实验,我们根据基于FPGA的神经网络加速系统的各种异常行为来确定系统例外,并分析基本原因。特别是,我们发现系统异常的概率主导了系统的可靠性。故障也促进了神经网络模型的精度劣化,但有影响取决于模型的应用,并且可以大大变化。此外,我们还评估了传统三重模块化冗余(TMR)的使用,并展示了TMR与实验和分析模型的挑战,这可能揭示了基于FPGA的神经网络加速系统的可靠性设计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号