【24h】

Low Bit-Width Convolutional Neural Network on RRAM

机译:RRAM上的低位宽度卷积神经网络

获取原文
获取原文并翻译 | 示例

摘要

The emerging resistive random-access memory (RRAM) has been widely applied in accelerating the computing of deep neural networks. However, it is challenging to achieve high-precision computations based on RRAM due to the limits of the resistance level and the interfaces. Low bit-width convolutional neural networks (CNNs) provide promising solutions to introduce low bit-width RRAM devices and low bit-width interfaces in RRAM-based computing system (RCS). While open questions still remain regarding: 1) how to make matrix splitting when a single crossbar is not large enough to hold all parameters of one weight matrix; 2) how to design a pipeline to accelerate the inference based on line buffer structure; and 3) how to reduce the accuracy drop due to the parameter splitting and data quantization. In this paper, we propose an RRAM crossbar-based low bit-width CNN (LB-CNN) accelerator. We make detailed discussion on the system design, including the matrix splitting strategies to enhance the scalability, and the pipelined implementation based on line buffers to accelerate the inference. In addition, we propose a splitting and quantizing while training method to incorporate the actual hardware constraints with the training. In our experiments, low bit-width LeNet-5 on RRAM show much better robustness than multibit models with device variation. The pipeline strategy achieves approximately 6.0x speedup to process each image on ResNet-18. For low-bit VGG-8 on CIFAR-10, the proposed accelerator saves 54.9% of the energy consumption and 48.3% of the area compared with the multibit VGG-8 structure.
机译:新兴电阻随机存取存储器(RRAM)已广泛应用于加速深度神经网络的计算。然而,由于电阻水平和接口的限制,基于RRAM实现高精度计算是具有挑战性的。低位宽度卷积神经网络(CNNS)提供了有希望的解决方案,以引入基于RRAM的计算系统(RCS)中的低位宽度RRAM器件和低位宽接口。虽然打开问题仍然存在:1)如何在单个横杆不大于足够大以保持一个权重矩阵的所有参数时进行矩阵拆分; 2)如何设计管道以基于行缓冲区结构加速推断; 3)如何降低由于参数分离和数据量化引起的精度下降。在本文中,我们提出了一种基于RRAM横杆的低比特宽度CNN(LB-CNN)加速器。我们详细讨论了系统设计,包括矩阵分裂策略,以增强可伸缩性,以及基于线缓冲器的流水线实现,以加速推断。此外,我们提出了一种在训练方法中融合和量化,以将实际的硬件限制与训练合并。在我们的实验中,RRAM上的低位宽度LENET-5比具有设备变化的多性模型显示出更好的鲁棒性。管道策略达到大约6.0倍的加速度,以在Resnet-18上处理每个图像。对于CIFAR-10上的低位VGG-8,拟议的加速器与多点VGG-8结构相比,该促进剂的能量消耗量为54.9%,48.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号