SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup

Chen Xizi; Jiang Jingbo; Zhu Jingyang; Tsui Chi-Ying

首页> 外文期刊>Integration >SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup

【24h】

SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup

机译：SubMac：在基于RRAM的CNN加速器中利用基于子词的计算以节省能源并提高速度

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Although the CMOS-based CNN accelerators have achieved impressive progress, the memory wall issue and the high power density are still the major barriers for substantial improvement in energy efficiency and throughput. As an attractive alternative, recently the Resistive RAM-based accelerators have delivered significant breakthroughs by leveraging the in-situ computation. However, there are still some challenges, including the high computation complexity and the large energy overhead at the analog/digital interfacing circuits. In this work, we take advantage of the subword-based computation in the Resistive RAM-based accelerator to achieve energy saving and speedup. First, an encoding method is proposed for the weights and activations to reduce the energy consumption of the in-situ computation and the resolution requirement of ADC. Then the resolution of ADC is further optimized based on the distribution of the subword computation results. Furthermore, a dynamic quantization scheme is proposed to skip 67%-87% of the subword computations which outperforms the conventional quantization schemes. We fully investigate the influences of the encoding scheme and the layer-wise quantization range scaling on the performance of dynamic quantization. Finally, we demonstrate the effectiveness of the proposed algorithms under different hardware configurations and network complexities. A dedicated architecture, SubMac, is proposed to implement the above schemes. Experimental results show that the energy efficiency and the throughput are improved by 2.8-5.7 and 2.5-7.9 times, respectively, when compared with the state-of-the-art Resistive RAM-based accelerators.

机译：尽管基于CMOS的CNN加速器取得了令人瞩目的进展，但是内存壁问题和高功率密度仍然是大幅提高能效和吞吐量的主要障碍。作为一种有吸引力的替代方案，最近基于电阻RAM的加速器通过利用原位计算实现了重大突破。然而，仍然存在一些挑战，包括高计算复杂度和模拟/数字接口电路处的大量能量开销。在这项工作中，我们利用基于电阻RAM的加速器中基于子字的计算来实现节能和加速。首先，提出了一种权重和激活的编码方法，以减少原位计算的能耗和ADC的分辨率要求。然后根据子字计算结果的分布进一步优化ADC的分辨率。此外，提出了一种动态量化方案，以跳过67％-87％的子词计算，其性能优于传统的量化方案。我们充分研究了编码方案和分层量化范围缩放对动态量化性能的影响。最后，我们证明了在不同硬件配置和网络复杂性下所提出算法的有效性。提出了专用的架构SubMac，以实现上述方案。实验结果表明，与最新的基于电阻RAM的加速器相比，能量效率和吞吐量分别提高了2.8-5.7和2.5-7.9倍。

著录项

来源
《Integration》 |2019年第11期|356-368|共13页
作者
Chen Xizi; Jiang Jingbo; Zhu Jingyang; Tsui Chi-Ying;
展开▼
作者单位

Hong Kong Univ Sci & Technol Dept Elect & Comp Engn Clear Water Bay Hong Kong Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Convolutional neural network; Resistive RAM; Subword encoding; Dynamic quantization; Computation reduction;

机译：卷积神经网络电阻RAM;子字编码;动态量化;计算减少;

相似文献

外文文献
中文文献
专利

1. Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators [J] . Jasemi Masoomeh, Hessabi Shaahin, Bagherzadeh Nader Computers and Electrical Engineering . 2020,第1期

机译：用于CNN加速器的可靠和节能的MLC STT-RAM缓冲器
2. Phone2Cloud: Exploiting computation offloading for energy saving on smartphones in mobile cloud computing [J] . Feng Xia, Fangwei Ding, Jie Li, Information systems frontiers . 2014,第1期

机译：Phone2Cloud：利用计算分流来节省移动云计算中智能手机的能耗
3. SqueezeFlow: A Sparse CNN Accelerator Exploiting Concise Convolution Rules [J] . Li Jiajun, Jiang Shuhao, Gong Shijun, IEEE Transactions on Computers . 2019,第11期

机译：SqueezeFlow：利用简洁卷积规则的稀疏CNN加速器
4. RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging [C] . Chao-Tsung Huang ACM/IEEE Annual International Symposium on Computer Architecture . 2021

机译：RingCNN：利用代数 - 稀疏环形张量，用于节能CNN的基于CNN的计算成像
5. Predictive energy management in smart vehicles: Exploiting traffic and traffic signal preview for fuel saving. [D] . Asadi, Behrang. 2009

机译：智能车辆中的预测性能源管理：利用交通和交通信号预览来节省燃料。
6. Exploiting Web Matrix Permutations to Speedup PageRank Computation [O] . Del Corso Gianna, Gulli Antonio, Romani Francesco 2004

机译：利用Web矩阵排列来加速PageRank计算

SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅