【24h】

Impact of Structural Faults on Neural Network Performance

机译:结构性故障对神经网络性能的影响

获取原文

摘要

Deep Learning (DL), a subset of Artificial Intelligence (AI), is growing rapidly with possible applications in different domains such as speech recognition, computer vision etc. Deep Neural Network (DNN), the backbone of DL algorithms is a directed graph containing multiple layers with different number of neurons residing in each layer. The use of these networks has been increased in the last few years due to availability of large data sets and huge computation power. As the size of DNN is growing over the years, researchers have developed specialized hardware accelerators to reduce the inference compute time. An example of such domain specific architecture designed for Neural Network acceleration is Tensor Processing Unit (TPU) which outperforms GPU in the inference stage of DNN execution. The heart of this inference engine is a Matrix Multiplication unit which is based on systolic array architecture. The TPU's systolic array is a grid-like structure made of individual processing elements that can be extended along rows and columns. Due to external environmental factors or internal scaling of semiconductor, these systems are often prone to faults which leads to improper calculations and thereby resulting in inaccurate decisions by the DNN. Although a lot of work has been done in the past on the computing array implementation and it's reliability concerns, their fault tolerance behavior for DNN application is not very well understood. It is not even clear what would be the impact of various different faults on the accuracy. We in this work, first study possible mapping strategies to implement a convolution and dense layer weights on TPU systolic array. Next we consider various faults scenarios that may occur in the array. We divide these fault scenarios into low, high row and column faults (Fig. 1(a) pictorially represents column faults) modes with respect to the multiplication unit. Next, we study the impact of these fault models on the overall accuracy of the DNN performance on a faculty TPU unit. The goal is to study the resiliency and overcome the limitations of earlier work. The previous work was very effective in masking the random faults which used pruning of weights (removing weights or connections in the DNN) plus retraining to mask the faults on the array. However, it failed in the case of column faults which is clearly shown in Fig. 1(b). We also propose techniques to mitigate or bypass the row and column faults. Our mapping strategy follows physical_x(i) = i%N and physical_y(j) = j%N where (i,j) represents the index of dense (FC) weight matrix and (physical x(i), physical y(j)) indicates the actual physical location on the array of size N. The convolution filters are linearized with respect to every channel so as to convert them into proper weight matrix and mapped according to the previous mentioned policy. It was shown that DNNs can up to certain faults in the array while retaining the original accuracy (low row faults). The accuracy of the network decreases even with one column faults if it (column) is in the use. As per the results, it is proved that for the same number of row and column faults, the latter has most impact on the network accuracy because pruning input neuron has very little effect than pruning an output neuron. We experimented with three different networks and found the influence of these different faults to be the same. These faults can be mitigated using techniques like Matrix Transpose and Array Reduction which does not require retraining of weights. For low row faults, the original mapping policy can be retained such that weights can be mapped at their exact locations which does not affect the accuracy. Low column faults can be converted into low row faults by transposing the matrix. In the case of high row (column) faults, the entire row (column) has to be avoided to completely bypass the faulty locations. Static mapping of weights along with retraining the network on the array can be effective in the case of random faults. Adapting to change in the case of structured faults can reduce the burden of retraining which happens outside the TPU.
机译:深度学习(DL)是人工智能(AI)的一个子集,并且在语音识别,计算机视觉等不同领域中的可能应用迅速发展。深度神经网络(DNN)是DL算法的基础,它是一个有向图,包含多层,每层中存在不同数量的神经元。由于大数据集的可用性和强大的计算能力,最近几年来对这些网络的使用有所增加。随着DNN规模的增长,研究人员已经开发了专用的硬件加速器以减少推理计算时间。为神经网络加速而设计的此类领域特定体系结构的示例是张量处理单元(TPU),其在DNN执行的推理阶段优于GPU。这个推理引擎的核心是一个基于脉动阵列架构的矩阵乘法单元。 TPU的脉动阵列是由单个处理元件组成的网格状结构,可以沿着行和列扩展。由于外部环境因素或半导体的内部缩放,这些系统通常容易出现故障,从而导致计算不正确,从而导致DNN的决策不准确。尽管过去在计算阵列实现方面已经完成了很多工作,并且涉及到可靠性,但是对于DNN应用程序的容错行为却知之甚少。甚至不清楚各种不同故障对精度的影响。我们在这项工作中,首先研究了可能的映射策略,以在TPU收缩压阵列上实现卷积和密集层权重。接下来,我们考虑阵列中可能发生的各种故障情况。相对于乘法单元,我们将这些故障场景分为低,高行和列故障(图1(a)以图形方式表示列故障)模式。接下来,我们研究这些故障模型对教师TPU单元DNN性能的总体准确性的影响。目的是研究弹性并克服早期工作的局限性。先前的工作在掩盖随机错误方面非常有效,该随机错误使用权重修剪(在DNN中删除权重或连接)加上重新训练来掩盖阵列中的错误。但是,它在列故障的情况下失败了,如图1(b)清楚所示。我们还提出了减轻或绕过行和列故障的技术。我们的映射策略遵循physical_x(i)= i%N和physical_y(j)= j%N,其中(i,j)表示稠密(FC)权重矩阵的索引,而(物理x(i),物理y(j) )表示大小为N的数组上的实际物理位置。卷积滤波器针对每个通道进行线性化处理,以将其转换为合适的权重矩阵,并根据前面提到的策略进行映射。结果表明,DNN可以保留阵列的某些错误,同时又保持原始精度(低行错误)。如果使用一列(列),则即使出现一列故障,网络的精度也会降低。根据结果​​证明,对于相同数量的行和列故障,后者对网络精度的影响最大,因为修剪输入神经元的效果比修剪输出神经元的效果很小。我们对三个不同的网络进行了实验,发现这些不同故障的影响是相同的。可以使用矩阵转置和数组归约等技术来减轻这些故障,该技术不需要重新训练权重。对于低行故障,可以保留原始映射策略,以便可以将权重映射到它们的准确位置,而不会影响准确性。通过对矩阵进行转置,可以将低列故障转换为低列故障。在出现高行(列)故障的情况下,必须避免整个行(列)以完全绕开故障位置。在随机故障的情况下,权重的静态映射以及在阵列上重新训练网络可能是有效的。在结构性故障的情况下适应变化可以减少在TPU外部进行再培训的负担。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号