首页> 外文OA文献 >10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
【2h】

10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors

机译:用于二进制和DNN边缘处理器的二进制和多点MAC操作的10T SRAM计算内存宏

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm2 throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by $16imes $ compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.
机译:计算内存存储器(CIM)是一种有希望的方法,可以减少延迟和提高人工智能(AI)边缘处理器的存储壁约束下的乘法和累积(MAC)操作的能量效率。本文提出了一种专注于使用新的十晶体管(10T)静态随机存取存储器(SRAM)位单元的可扩展CIM设计的方法。使用所提出的10T SRAM位单元,我们呈现两个基于SRAM的CIM(SRAM-CIM)宏支持多点和二进制MAC操作。第一个设计使用32并行二进制MAC操作实现完全并行计算和高吞吐量。提出了先进的电路技术,例如输入相关的动态参考生成器和输入升压读出放大器。该设计采用28纳米CMOS工艺制造,实现409.6吞吐量,1001.7顶/宽能效,效率为169.9件/ MM2吞吐量区域效率。所提出的方法有效解决了以前的问题,例如写入干扰,吞吐量和模数转换器的功耗(ADC)。第二种设计支持多点MAC操作(4-B重量,4-B输入和8-B输出),以提高推理精度。我们提出了一种架构,该架构将4-B重量和4-B输入乘法分开至四个2-B乘法,与传统的4-B乘法相比,将信号裕度增加为16倍。此外,使用SRAM-CIM架构中存在的内部位线电容有效地解决了电容式数模数转换器(CDAC)区域问题。通过修改的LENET-5神经网络成功地证明了实现使用CDAC的四个2-B并行乘法的所提出的方法。这些结果表明,所提出的10T位单元是有希望实现稳健和可扩展的SRAM-CIM设计,这对于实现完全平行的边缘计算至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号