首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks
【24h】

XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks

机译:XNOR-RRAM:二元神经网络的可扩展和平行电阻突触架构

获取原文

摘要

Recent advances in deep learning have shown that Binary Neural Networks (BNNs) are capable of providing a satisfying accuracy on various image datasets with significant reduction in computation and memory cost. With both weights and activations binarized to +1 or -1 in BNNs, the high-precision multiply-and-accumulate (MAC) operations can be replaced by XNOR and bit-counting operations. In this work, we propose a RRAM synaptic architecture (XNOR-RRAM) with a bit-cell design of complementary word lines that implements equivalent XNOR and bit-counting operation in a parallel fashion. For large-scale matrices in fully connected layers or when the convolution kernels are unrolled in multiple channels, the array partition is necessary. Multi-level sense amplifiers (MLSAs) are employed as the intermediate interface for accumulating partial weighted sum. However, a low bit-level MLSA and intrinsic offset of MLSA may degrade the classification accuracy. We investigate the impact of sensing offsets on classification accuracy and analyze various design options with different sub-array sizes and sensing bit-levels. Experimental results with RRAM models and 65nm CMOS PDK show that the system with 128×128 sub-array size and 3-bit MLSA can achieve accuracies of 98.43% for MLP on MNIST and 86.08% for CNN on CIFAR-10, showing 0.34% and 2.39% degradation respectively compared to the accuracies of ideal BNN algorithms. The projected energy-efficiency of XNOR-RRAM is 141.18 TOPS/W, showing ~33X improvement compared to the conventional RRAM synaptic architecture with sequential row-by-row read-out.
机译:深度学习的最新进步表明,二元神经网络(BNN)能够在各种图像数据集上提供满足的准确性,其具有显着降低计算和存储器成本。对于BNN中的重量和激活,在BNN中达到+1或-1,高精度乘法和累积(MAC)操作可以由Xnor和位计数操作替换。在这项工作中,我们提出了一个RRAM突触架构(Xnor-RRAM),其互补字线的互补字线设计,其以并行方式实现等效的XNOR和位计数操作。对于完全连接的图层中的大型矩阵或卷积内核在多个通道中展开时,阵列分区是必要的。多电平读出放大器(MLSA)用作累积部分加权和的中间接口。然而,低比特级MLSA和MLSA的固有偏移可能会降低分类精度。我们调查传感抵消对分类准确性的影响,并通过不同的子阵列尺寸和感测位级别分析各种设计选项。具有RRAM模型和65nm CMOS PDK的实验结果表明,具有128×128个子阵列尺寸和3位MLSA的系统可以在MNIST上实现MLP的精度为98.43 %,CNN在CNN上为CNN达到CNN,显示0.34与理想BNN算法的精度相比,%和2.39 %劣化。 XNOR-RRAM的预计节能为141.18个顶部/倍,与传统的RRAM突触架构相比,与逐行逐行读出的传统RRAM突触架构相比,〜33x改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号