首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks
【24h】

Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

机译:Vesti:用于深度神经网络的节能型内存中计算加速器

获取原文
获取原文并翻译 | 示例

摘要

To enable essential deep learning computation on energy-constrained hardware platforms, including mobile, wearable, and Internet of Things (IoT) devices, a number of digital ASIC designs have presented customized dataflow and enhanced parallelism. However, in conventional digital designs, the biggest bottleneck for energy-efficient deep neural networks (DNNs) has reportedly been the data access and movement. To eliminate the storage access bottleneck, new SRAM macros that support in-memory computing have been recently demonstrated. Several in-SRAM computing works have used the mix of analog and digital circuits to perform XNOR-and-ACcumulate (XAC) operation without row-by-row memory access and can map a subset of DNNs with binary weights and binary activations. In the single array level, large improvement in energy efficiency (e.g., two orders of magnitude improvement) has been reported in computing XAC over digital-only hardware performing the same operation. In this article, by integrating many instances of such in-memory computing SRAM macros with an ensemble of peripheral digital circuits, we architect a new DNN accelerator, titled Vesti. This new accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC. Vesti also employs double-buffering with two groups of in-memory computing SRAMs, effectively hiding the row-by-row write latencies of in-memory computing SRAMs. The Vesti accelerator is fully designed and laid out in 65-nm CMOS, demonstrating ultralow energy consumption of < 20 nJ for MNIST classification and < 40 mu J for CIFAR-10 classification at 1.0-V supply.
机译:为了在能耗受限的硬件平台(包括移动设备,可穿戴设备和物联网(IoT)设备)上实现必要的深度学习计算,许多数字ASIC设计都提供了定制的数据流和增强的并行性。但是,在传统的数字设计中,据报道,高效节能的深度神经网络(DNN)的最大瓶颈是数据访问和移动。为了消除存储访问瓶颈,最近已经展示了支持内存计算的新SRAM宏。 SRAM中的一些计算工作已使用模拟电路和数字电路的混合来执行XNOR和累加(XAC)操作,而无需逐行访问存储器,并且可以使用二进制权重和二进制激活来映射DNN的子集。在单阵列级别上,已经报告了在通过执行相同操作的仅数字硬件上计算XAC时,能效大大提高(例如,两个数量级的提高)。在本文中,通过将此类内存中计算SRAM宏的许多实例与一组外围数字电路集成在一起,我们构建了一个新的DNN加速器,名为Vesti。与传统的数字ASIC相比,这种新的加速器旨在无缝支持可配置的多位激活和大规模DNN,同时以有利的精度折衷来显着提高芯片级能效。 Vesti还对两组内存中的计算SRAM进行双缓冲,从而有效地隐藏了内存中的计算SRAM的逐行写入延迟。 Vesti加速器经过全面设计,并采用65nm CMOS布局,在1.0V电源下,MNIST分类的超低能耗小于20nJ,CIFAR-10分类的超低能耗为40μJ。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号