首页> 外文期刊>Integration >SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup
【24h】

SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup

机译:SubMac:在基于RRAM的CNN加速器中利用基于子词的计算以节省能源并提高速度

获取原文
获取原文并翻译 | 示例

摘要

Although the CMOS-based CNN accelerators have achieved impressive progress, the memory wall issue and the high power density are still the major barriers for substantial improvement in energy efficiency and throughput. As an attractive alternative, recently the Resistive RAM-based accelerators have delivered significant breakthroughs by leveraging the in-situ computation. However, there are still some challenges, including the high computation complexity and the large energy overhead at the analog/digital interfacing circuits. In this work, we take advantage of the subword-based computation in the Resistive RAM-based accelerator to achieve energy saving and speedup. First, an encoding method is proposed for the weights and activations to reduce the energy consumption of the in-situ computation and the resolution requirement of ADC. Then the resolution of ADC is further optimized based on the distribution of the subword computation results. Furthermore, a dynamic quantization scheme is proposed to skip 67%-87% of the subword computations which outperforms the conventional quantization schemes. We fully investigate the influences of the encoding scheme and the layer-wise quantization range scaling on the performance of dynamic quantization. Finally, we demonstrate the effectiveness of the proposed algorithms under different hardware configurations and network complexities. A dedicated architecture, SubMac, is proposed to implement the above schemes. Experimental results show that the energy efficiency and the throughput are improved by 2.8-5.7 and 2.5-7.9 times, respectively, when compared with the state-of-the-art Resistive RAM-based accelerators.
机译:尽管基于CMOS的CNN加速器取得了令人瞩目的进展,但是内存壁问题和高功率密度仍然是大幅提高能效和吞吐量的主要障碍。作为一种有吸引力的替代方案,最近基于电阻RAM的加速器通过利用原位计算实现了重大突破。然而,仍然存在一些挑战,包括高计算复杂度和模拟/数字接口电路处的大量能量开销。在这项工作中,我们利用基于电阻RAM的加速器中基于子字的计算来实现节能和加速。首先,提出了一种权重和激活的编码方法,以减少原位计算的能耗和ADC的分辨率要求。然后根据子字计算结果的分布进一步优化ADC的分辨率。此外,提出了一种动态量化方案,以跳过67%-87%的子词计算,其性能优于传统的量化方案。我们充分研究了编码方案和分层量化范围缩放对动态量化性能的影响。最后,我们证明了在不同硬件配置和网络复杂性下所提出算法的有效性。提出了专用的架构SubMac,以实现上述方案。实验结果表明,与最新的基于电阻RAM的加速器相比,能量效率和吞吐量分别提高了2.8-5.7和2.5-7.9倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号