首页> 外文期刊>IEEE transactions on circuits and systems . I , Regular papers >A Precision-Scalable Energy-Efficient Convolutional Neural Network Accelerator
【24h】

A Precision-Scalable Energy-Efficient Convolutional Neural Network Accelerator

机译:精密可扩展的节能卷积神经网络加速器

获取原文
获取原文并翻译 | 示例

摘要

Quantization is a promising technique to compress the size of Convolutional Neural Network (CNN) models. Recently, various precision-scalable designs have been presented to reduce the computational complexity in CNNs. However, most of them adopt straightforward calculation scheme to implement the CNN, which causes high bandwidth requirement and low hardware utilization efficiency. This paper proposes a new precision-scalable architecture which can fully reduce the computational complexity in CNN inference and meanwhile has a finely simplified calculation scheme. Based on the proposed scheme, a well-optimized multiplier called Compositional Processing Element (C-PE) is devised. Compared with the previous multipliers, the new C-PE requires less area and power. Furthermore, two levels of optimization are introduced to the design to relieve the bandwidth problem and increase the hardware utilization efficiency. Implemented under the TSMC 90nm CMOS technology, the whole design achieves 6-68.1 fps in various precisions on VGG16 benchmark and a 49.8TOPS/W energy efficiency at 500MHz when scaled to 28nm, which is much better than previous precision-scalable ones.
机译:量化是压缩卷积神经网络(CNN)模型的大小的有希望的技术。最近,已经提出了各种精密可扩展的设计以降低CNN中的计算复杂性。然而,它们中的大多数采用直接的计算方案来实现CNN,这导致高带宽要求和低硬件利用效率。本文提出了一种新的精密可扩展结构,可以完全降低CNN推断中的计算复杂度,同时具有精细简化的计算方案。基于所提出的方案,设计了一种称为组成处理元件(C-PE)的良好优化的乘数。与先前的乘法器相比,新的C-PE需要更少的区域和功率。此外,将两个优化级别引入设计,以减轻带宽问题并提高硬件利用效率。在TSMC 90nm CMOS技术下实施,整个设计在VGG16基准测试中的各种精度下实现了6-68.1fps,并且在缩放到28nm时为500MHz的49.8top / W能量效率,这比以前的精密可扩展的更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号