An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

Khaddam-Aljameh Riduan; Francese Pier-Andrea; Benini Luca; Eleftheriou Evangelos

首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

【24h】

An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

机译：基于SRAM的MultiBIT内存矩阵 - 矢量乘数，具有精度，可在面积，时间和电源线性缩放

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A novel interleaved switched-capacitor and SRAM-based multibit matrix-vector multiply-accumulate engine for in-memory computing is presented. Its operation principle is based on first converting an SRAM-stored n-bit weight into a proportional voltage using a pipeline D/A converter built from n + 1 equally sized stages. A switched-capacitor stage then multiplies these voltages with an m-bit digital input activation. Finally, the output voltages that correspond to the different multiplication results are accumulated along one column by means of charge-sharing. With our proposed architecture, the required circuit area, computation time, and power consumption scale linearly versus the bit resolution of both the inputs and the weights. Analytical formulas are presented for the energy consumption in both capacitors and switches. Moreover, the impact of fabrication mismatch on analog computation accuracy is examined. The full system architecture is described, and the feasibility is demonstrated, via a full macroimplementation study in 14 nm, detailing area and energy consumption, as well as the overall latency. Finally, a specific design of a 128x2048 6-bit weight and 6-bit input signed matrix-vector multiplication accelerator system in 14 nm is presented, which runs at 2.43 TOP/ s at an efficiency of 16.94 TOP/s/W, while using the nominal supply voltage of 0.8 V. If the operands' precision is considered in the metric, then the efficiency becomes 609.7 TOP/s/W.

机译：提出了一种用于内存计算的新型交错的开关电容和基于SRAM的多点矩阵矢量乘法累积引擎。其操作原理基于首先使用从N + 1等大小的阶段建造的管道D / A转换器将SRAM存储的N比特权重转换为比例电压。然后，开关电容级将这些电压乘以M位数字输入激活。最后，对应于不同乘法结果的输出电压通过电荷共享沿一列累积。利用我们所提出的架构，所需的电路区域，计算时间和功耗尺度线性地与输入和权重的比特分辨率线性相比。对电容器和开关中的能量消耗提出了分析公式。此外，检查了制造不匹配对模拟计算精度的影响。描述了完整的系统架构，并且通过14 nm的完整宏观主义研究，详细说明区域和能量消耗以及整体延迟来说明可行性。最后，提出了128x2048 6位重量和6位输入符号矩阵矢量乘法系统的特定设计，在14nm中呈现，其在使用时以16.94顶/秒的效率在2.43上运行值为0.8 V的标称电源电压。如果在公制中考虑操作数的精度，则效率变为609.7顶部/ s / w。

著录项

来源
《IEEE transactions on very large scale integration (VLSI) systems》 |2021年第2期|372-385|共14页
作者
Khaddam-Aljameh Riduan; Francese Pier-Andrea; Benini Luca; Eleftheriou Evangelos;
展开▼
作者单位

IBM Zurich Res Lab CH-8803 Ruschlikon Switzerland|Swiss Fed Inst Technol Integrated Syst Lab CH-8092 Zurich Switzerland;

IBM Zurich Res Lab CH-8803 Ruschlikon Switzerland;

Swiss Fed Inst Technol Integrated Syst Lab CH-8092 Zurich Switzerland|Univ Bologna Dept Elect Elect & Informat Engn I-40136 Bologna Italy;

IBM Zurich Res Lab CH-8803 Ruschlikon Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Analog computation; hardware accelerator; in-memory computation; multibit weights; SRAM;

机译：模拟计算;硬件加速器;内存计算;多层重量;SRAM;

相似文献

外文文献
专利

1. Scalable Gaussian Normal Basis Multipliers over GF(2~m) Using Hankel Matrix-Vector Representation [J] . Chiou-Yng Lee, Che Wun Chiou Journal of signal processing systems for signal, image, and video technology . 2012,第2期

机译：使用汉克矩阵向量表示法在GF（2〜m）上可扩展的高斯正态基乘
2. A New Recursive Multibit Recoding Algorithm for High-Speed and Low-Power Multiplier [J] . Abdelkrim K. Oudjida, Nicolas Chaillet, Ahmed Liacha, Journal of Low Power Electronics . 2012,第5期

机译：高速低功耗乘法器的一种新的递归多位编码算法
3. Parallel implementation of an efficient preconditioned linear solver for grid-based applications in chemical physics. III: Improved parallel scalability for sparse matrix-vector products [J] . Wenwu Chen, Bill Poirier Journal of Parallel and Distributed Computing . 2010,第7期

机译：用于化学物理学中基于网格的应用程序的高效预处理线性求解器的并行实现。 III：稀疏矩阵矢量乘积的并行可扩展性得到改善
4. An adaptive CMOS matrix-vector multiplier for large scale analog hardware neural network applications [C] . Cauwenberghs, G., Neugebauer, . 1991

机译：适用于大规模模拟硬件神经网络应用的自适应CMOS矩阵矢量乘法器
5. A novel low power multi path double precision fused multiplier accumulator architecture. [D] . Gopal, Mangala. 2015

机译：一种新颖的低功耗多径双精度熔断乘法器累加器架构。
6. A Low-Power Time-Division-Multiplexed Vector Matrix-Multiplier for a Vestibular Prosthesis [O] . Hakan Töreyin, Pamela T. Bhatti -1

机译：一种用于前庭假体的低功耗时分多路复用矢量矩阵乘法器
7. A new recursive multibit recoding algorithm for high-speed an low-power multiplier. [O] . K. Oudjida Abdelkrim, Chaillet Nicolas, Liacha Ahmed, 2012

机译：一种用于高速低功耗乘法器的新的递归多位重新编码算法。

An SRAM-Based Multibit In-Memory Matrix-Vector Multiplier With a Precision That Scales Linearly in Area, Time, and Power

摘要

著录项

相似文献

相关主题

期刊订阅