首页> 外文会议>IEEE International Solid- State Circuits Conference >15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices
【24h】

15.4 A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices

机译:15.4具有121-28TOPS / W的22nm 2Mb ReRAM内存中计算宏,用于微小的AI Edge设备的多位MAC计算

获取原文

摘要

Nonvolatile computing-in-memory (nvCIM) can improve the latency (tAC) and energy-efficiency (EFMAC) of tiny AI edge devices performing multiply-and-accumulate (MAC) computing after system wake-up. Prior nvCIMs have proven effective for binary input (IN) and weight (W), and 3b output (OUT) [1], 1-8-1b IN-W-OUT [2], and 2-3-4b IN-W-OUT [3] neural networks; however, the higher precision (4-4b IN-W) for MAC operations is needed for multi-bit CNNs to achieved high-inference accuracy [4]. As Fig.15.4.1 shows, improving the precision of nvCIM macros involves various challenges. (1) A large number of activated WLs provides a wide range of BL current (IBL) resulting in an inaccurate BL-clamping voltage (VBLC); as well as a large (IBL) requiring a large array area due to the need for wide metal lines to support high-current density. (2) Previous “WL = input” approaches suffer from: (a) few parallel inputs (IN#) due to (1), and (b) long (tAC) in multiple cycles of binary WL inputs on 1T1R cells for multibit inputs. (3) Previous positive-negative-split weight-mapping consumes high total (lBL) and area overhead (needing 2x(m-1) cells for a signed m-bit weight) for cell arrays with high-weight precision. (4) Long (tAC) and a large number of reference currents (IREF#) for high-precision outputs. To overcome these challenges, this work proposes: (1) a BL-IN-OUT multibit computing (BLIOMC) scheme using a single WL-on and input-aware multibit BL clamping (IA-MBC) to shorten (lBL) for multibit inputs, increase IN#, and reduce the (lBL) range/size for accurate (VBLC) and a compact array area. (2) Scrambled 2's complement (S2C) weight mapping (S2CWM), input-aware source-line (SL) voltage biasing (IA-SLVB), and an S2C value combiner (S2CVC) to reduce area overhead and lBL in the cell array. (3) A dual-bit small-offset current-mode sense amplifier (DbSO-CSA) to reduce IREF# and tAC. A fabricated 22nm 2Mb ReRAM-CIM macro presents the first 4b-input nvCIM macro, featuring a 9.8-18.3ns tAC and an EFMAC of 121.3-28.9TOPS/W from binary to 4bIN-4bW-11bOUT compute precisions.
机译:非易失性内存计算(nvCIM)可以改善延迟(t AC )和能源效率(EF MAC 小型AI边缘设备在系统唤醒后执行乘加(MAC)计算。先前的nvCIM已被证明对二进制输入(IN)和权重(W),3b输出(OUT)[1],1-8-1b IN-W-OUT [2]和2-3-4b IN-W有效-OUT [3]神经网络;但是,多位CNN需要更高的MAC操作精度(4-4b IN-W)才能达到较高的推理精度[4]。如图15.4.1所示,提高nvCIM宏的精度涉及各种挑战。 (1)大量激活的WL提供了广泛的BL电流(I BL )导致BL钳位电压(V BLC );以及大型(我 BL )由于需要宽的金属线来支持高电流密度,因此需要较大的阵列面积。 (2)先前的“ WL =输入”方法存在以下缺点:(a)由于(1)导致并行输入(IN#)少,并且(b)较长(t AC )在1T1R单元的多路二进制WL输入的多周期输入中。 (3)先前的正负拆分体重映射消耗高的总和(l BL )和面积开销(对于具有高权重精度的像元阵列,需要2x(m-1)个像元用于带符号的m位权重)。 (4)长(t AC )和用于高精度输出的大量参考电流(IREF#)。为了克服这些挑战,这项工作提出了:(1)一种BL-IN-OUT多位计算(BLIOMC)方案,该方案使用单个WL-on和输入感知的多位BL钳位(IA-MBC)来缩短(l BL )(对于多位输入,请增加IN#,并减少(l BL )范围/大小以精确(V BLC )和紧凑的数组区域。 (2)扰码2的补码(S2C)权重映射(S2CWM),可识别输入的源线(SL)电压偏置(IA-SLVB)和S2C值组合器(S2CVC),以减少区域开销并减少 BL 在单元格数组中。 (3)一个双位小偏移电流模式感测放大器(DbSO-CSA),可降低IREF#和t AC 。制成的22nm 2Mb ReRAM-CIM宏呈现了第一个4b输入nvCIM宏,其特征是9.8-18.3ns t AC 和EF MAC 从二进制到4bIN-4bW-11bOUT的121.3-28.9TOPS / W的计算精度。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号