首页> 外文会议>2018 Design, Automation amp; Test in Europe Conference amp; Exhibition >XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks
【24h】

XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks

机译:XNOR-RRAM:用于二进制神经网络的可扩展且并行的电阻式突触体系结构

获取原文
获取原文并翻译 | 示例

摘要

Recent advances in deep learning have shown that Binary Neural Networks (BNNs) are capable of providing a satisfying accuracy on various image datasets with significant reduction in computation and memory cost. With both weights and activations binarized to +1 or -1 in BNNs, the high-precision multiply-and-accumulate (MAC) operations can be replaced by XNOR and bit-counting operations. In this work, we propose a RRAM synaptic architecture (XNOR-RRAM) with a bit-cell design of complementary word lines that implements equivalent XNOR and bit-counting operation in a parallel fashion. For large-scale matrices in fully connected layers or when the convolution kernels are unrolled in multiple channels, the array partition is necessary. Multi-level sense amplifiers (MLSAs) are employed as the intermediate interface for accumulating partial weighted sum. However, a low bit-level MLSA and intrinsic offset of MLSA may degrade the classification accuracy. We investigate the impact of sensing offsets on classification accuracy and analyze various design options with different sub-array sizes and sensing bit-levels. Experimental results with RRAM models and 65nm CMOS PDK show that the system with 128×128 sub-array size and 3-bit MLSA can achieve accuracies of 98.43% for MLP on MNIST and 86.08% for CNN on CIFAR-10, showing 0.34% and 2.39% degradation respectively compared to the accuracies of ideal BNN algorithms. The projected energy-efficiency of XNOR-RRAM is 141.18 TOPS/W, showing ~33X improvement compared to the conventional RRAM synaptic architecture with sequential row-by-row read-out.
机译:深度学习的最新进展表明,二进制神经网络(BNN)能够在各种图像数据集上提供令人满意的准确性,并且显着减少了计算和存储成本。在BNN中将权重和激活均二进制化为+1或-1时,可以用XNOR和位计数操作代替高精度的乘加累加(MAC)操作。在这项工作中,我们提出了一种具有互补字线的位单元设计的RRAM突触体系结构(XNOR-RRAM),该结构以并行方式实现等效的XNOR和位计数操作。对于完全连接的层中的大规模矩阵,或者在多个通道中展开卷积内核时,必须进行阵列分区。多级感测放大器(MLSA)被用作中间接口,用于累积部分加权和。但是,低位MLSA和MLSA的固有偏移量可能会降低分类精度。我们调查感测偏移对分类准确性的影响,并分析具有不同子阵列大小和感测位级别的各种设计选项。使用RRAM模型和65nm CMOS PDK的实验结果表明,具有128×128子阵列大小和3位MLSA的系统在MNIST上对MLP的准确度为98.43%,在CIFAR-10上对CNN的准确度为86.08%,分别为0.34%和与理想的BNN算法相比,降幅分别为2.39%。 XNOR-RRAM的预计能量效率为141.18 TOPS / W,与具有连续逐行读出功能的常规RRAM突触体系结构相比,显示出约33倍的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号