首页> 外文会议>IEEE/ACM International Conference on Computer-Aided Design >Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices
【24h】

Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices

机译:重新设计嵌入式设备的机器学习加速器的片上存储子系统

获取原文

摘要

The rapid development of deep learning are enabling a plenty of novel applications such as image and speech recognition for embedded systems, robotics or smart wearable devices. However, typical deep learning models like deep convolutional neural networks (CNNs) consume so much on-chip storage and high-throughput compute resources that they cannot be easily handled by mobile or embedded devices with thrifty silicon and power budget. In order to enable large CNN models in mobile or more cutting-edge devices for IoT or cyberphysics applications, we proposed an efficient on-chip memory architecture for CNN inference acceleration, and showed its application to our in-house general-purpose deep learning accelerator. The redesigned on-chip memory subsystem, Memsqueezer, includes an active weight buffer set and data buffer set that embrace specialized compression methods to reduce the footprint of CNN weight and data set respectively. The Memsqueezer buffer can compress the data and weight set according to their distinct features, and it also includes a built-in redundancy detection mechanism that actively scans through the work-set of CNNs to boost their inference performance by eliminating the data redundancy. In our experiment, it is shown that the CNN accelerators with Memsqueezer buffers achieves more than 2× performance improvement and reduces 80% energy consumption on average over the conventional buffer design with the same area budget.
机译:深度学习的快速发展是实现大量新颖的应用,例如嵌入式系统,机器人或智能可穿戴设备的图像和语音识别。然而,典型的深度学习模型如深卷积神经网络(CNNS)消耗如此多的片上存储和高吞吐量计算资源,它们不能通过具有节俭的硅和电源预算的移动或嵌入式设备容易地处理。为了为IOT或Cyber​​icsics应用程序的移动或更多尖端设备启用大型CNN模型,我们提出了一种用于CNN推理加速的有效的片上内存架构,并将其应用于我们内部通用深度学习加速器。重新设计的片上存储器子系统MEMSqueEzer包括一个有效缓冲区集和数据缓冲器集,可分别包含专用压缩方法以减少CNN重量和数据集的占地面积。 MEMSQUEEZER缓冲区可以根据其不同的特征压缩数据和权重设置,并且还包括内置冗余检测机制,其通过消除数据冗余来主动扫描CNN的工作组来提高其推理性能。在我们的实验中,表明,具有MEMSQUEEZER缓冲器的CNN加速器达到了超过2倍的性能改进,并在具有相同区域预算的传统缓冲设计中平均降低了80%的能耗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号