...
首页> 外文期刊>Microelectronics journal >ARA: Cross-Layer approximate computing framework based reconfigurable architecture for CNNs
【24h】

ARA: Cross-Layer approximate computing framework based reconfigurable architecture for CNNs

机译:ARA:基于CNN的基于可重新配置架构的跨层近似计算框架

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Convolution Neural Networks are now widely used in image processing, object detection, video detection, and other classification tasks. Thus the acceleration of CNN is also widely researched for its complex computation features and data dependence. To achieve high energy efficiency, we proposed a CNN accelerator with approximate computing techniques. In this paper, two main aspects are studied: the hardware-compatible network compression algorithms, and the approximate computing units and architectures with hardware resource scheduling strategies. For the algorithm approximation part, we introduce a dynamic layered CNN structure for different scales of input, the convolution kernel shrinking strategy with layer-by-layer quantization to compress networks, and the Winograd Minimum Filter algorithm to decrease operations in convolution layers. For the architecture part, two types of approximate multipliers are innovated as iterative multipliers, and multi-port SRAM integrated LUT based multipliers. Approximate adders with error correction logic are also designed. Based on the approximate computing units, the Convolution Neural Processing Unit named CNPU is proposed with reconfigurable datapath designs for the mapping of different tasks. By the work on the algorithm, the CNPU architecture and the datapath design, we propose a high energy efficient reconfigurable CNN accelerator with approximate computing named ARA (Approximate computing based Reconfigurable Architecture). Implemented under TSMC 45 nm process, our accelerator achieves 1.92TOPS/W@ 1.1 V, 200 MHz and 3.72TOPS/W@ 0.9 V, 40 MHz in energy-efficiency, which is 1.51 similar to 4.36 times better than the state-of-the-art accelerators.
机译:卷积神经网络现在广泛用于图像处理,对象检测,视频检测和其他分类任务。因此,CNN的加速度也被广泛研究其复杂的计算特征和数据依赖性。为了实现高能量效率,我们提出了一种具有近似计算技术的CNN加速器。在本文中,研究了两个主要方面:硬件兼容的网络压缩算法以及具有硬件资源调度策略的近似计算单元和架构。对于算法近似部分,我们引入了用于不同输入的动态分层CNN结构,卷积核缩小策略具有逐层量化,以压缩网络,以及WinoGrad最小滤波器算法,以减少卷积层中的操作。对于架构部分,两种类型的近似乘法器被创新为迭代乘法器,以及基于多端口SRAM集成LUT的乘法器。还设计了具有纠错逻辑的近似添加剂。基于近似计算单元,提出了名为CNPU的卷积神经处理单元,用于映射不同任务的可重构数据路径设计。通过对算法的工作,CNPU架构和数据路径设计,我们提出了一种具有名为ARA的近似计算的高能效可重构CNN加速器(基于近似计算的可重新配置架构)。在TSMC 45 NM过程下实施,我们的加速器实现1.92TOPS / W @ 1.1 V,200 MHz和3.72TOP / W @ 0.9 V,40 MHz,其能效为1.51,其与最多的4.36倍相似。最艺术加速器。

著录项

  • 来源
    《Microelectronics journal》 |2019年第5期|33-44|共12页
  • 作者单位

    Southeast Univ Natl ASIC Syst Engn Technol Res Ctr Nanjing 210096 Jiangsu Peoples R China;

    Southeast Univ Natl ASIC Syst Engn Technol Res Ctr Nanjing 210096 Jiangsu Peoples R China;

    Southeast Univ Natl ASIC Syst Engn Technol Res Ctr Nanjing 210096 Jiangsu Peoples R China;

    Southeast Univ Natl ASIC Syst Engn Technol Res Ctr Nanjing 210096 Jiangsu Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Convolution Neural Networks; Approximate computing; Reconfigurable computing;

    机译:卷积神经网络;近似计算;可重新配置计算;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号