...
首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power
【24h】

An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

机译:基于SRAM的MultiBIT内存矩阵 - 矢量乘数,具有精度,可在面积,时间和电源线性缩放

获取原文
获取原文并翻译 | 示例
           

摘要

A novel interleaved switched-capacitor and SRAM-based multibit matrix-vector multiply-accumulate engine for in-memory computing is presented. Its operation principle is based on first converting an SRAM-stored n-bit weight into a proportional voltage using a pipeline D/A converter built from n + 1 equally sized stages. A switched-capacitor stage then multiplies these voltages with an m-bit digital input activation. Finally, the output voltages that correspond to the different multiplication results are accumulated along one column by means of charge-sharing. With our proposed architecture, the required circuit area, computation time, and power consumption scale linearly versus the bit resolution of both the inputs and the weights. Analytical formulas are presented for the energy consumption in both capacitors and switches. Moreover, the impact of fabrication mismatch on analog computation accuracy is examined. The full system architecture is described, and the feasibility is demonstrated, via a full macroimplementation study in 14 nm, detailing area and energy consumption, as well as the overall latency. Finally, a specific design of a 128x2048 6-bit weight and 6-bit input signed matrix-vector multiplication accelerator system in 14 nm is presented, which runs at 2.43 TOP/ s at an efficiency of 16.94 TOP/s/W, while using the nominal supply voltage of 0.8 V. If the operands' precision is considered in the metric, then the efficiency becomes 609.7 TOP/s/W.
机译:提出了一种用于内存计算的新型交错的开关电容和基于SRAM的多点矩阵矢量乘法累积引擎。其操作原理基于首先使用从N + 1等大小的阶段建造的管道D / A转换器将SRAM存储的N比特权重转换为比例电压。然后,开关电容级将这些电压乘以M位数字输入激活。最后,对应于不同乘法结果的输出电压通过电荷共享沿一列累积。利用我们所提出的架构,所需的电路区域,计算时间和功耗尺度线性地与输入和权重的比特分辨率线性相比。对电容器和开关中的能量消耗提出了分析公式。此外,检查了制造不匹配对模拟计算精度的影响。描述了完整的系统架构,并且通过14 nm的完整宏观主义研究,详细说明区域和能量消耗以及整体延迟来说明可行性。最后,提出了128x2048 6位重量和6位输入符号矩阵矢量乘法系统的特定设计,在14nm中呈现,其在使用时以16.94顶/秒的效率在2.43上运行值为0.8 V的标称电源电压。如果在公制中考虑操作数的精度,则效率变为609.7顶部/ s / w。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号