首页> 外文会议>Design, Automation Test in Europe Conference Exhibition >Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks
【24h】

Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks

机译:Chain-NN:节能的一维链体系结构,用于加速深度卷积神经网络

获取原文

摘要

Deep convolutional neural networks (CNN) have shown their good performances in many computer vision tasks. However, the high computational complexity of CNN involves a huge amount of data movements between the computational processor core and memory hierarchy which occupies the major of the power consumption. This paper presents Chain-NN, a novel energy-efficient 1D chain architecture for accelerating deep CNNs. Chain-NN consists of the dedicated dual-channel process engines (PE). In Chain-NN, convolutions are done by the 1D systolic primitives composed of a group of adjacent PEs. These systolic primitives, together with the proposed column-wise scan input pattern, can fully reuse input operand to reduce the memory bandwidth requirement for energy saving. Moreover, the 1D chain architecture allows the systolic primitives to be easily reconfigured according to specific CNN parameters with fewer design complexity. The synthesis and layout of Chain-NN is under TSMC 28nm process. It costs 3751k logic gates and 352KB on-chip memory. The results show a 576-PE Chain-NN can be scaled up to 700MHz. This achieves a peak throughput of 806.4GOPS with 567.5mW and is able to accelerate the five convolutional layers in AlexNet at a frame rate of 326.2fps. 1421.0GOPS/W power efficiency is at least 2.5 to 4.1x times better than the state-of-the-art works.
机译:深度卷积神经网络(CNN)在许多计算机视觉任务中都显示出了良好的性能。但是,CNN的高计算复杂性涉及计算处理器核心和内存层次结构之间的大量数据移动,这占据了功耗的主要部分。本文介绍Chain-NN,这是一种新型的节能一维链体系结构,用于加速深层CNN。 Chain-NN由专用的双通道过程引擎(PE)组成。在Chain-NN中,卷积是通过由一组相邻的PE组成的1D收缩原语完成的。这些脉动基元与建议的按列扫描输入模式一起使用,可以完全重用输入操作数,以减少内存带宽以节省能源。此外,一维链体系结构允许根据特定的CNN参数以较少的设计复杂性轻松地配置收缩原始图元。 Chain-NN的合成与布局采用的是台积电28nm工艺。它的成本为3751k逻辑门和352KB片上存储器。结果表明,576-PE Chain-NN可以扩展到700MHz。这样可以在567.5mW的功率下达到806.4GOPS的峰值吞吐量,并且能够以326.2fps的帧速率加速AlexNet中的五个卷积层。 1421.0GOPS / W的电源效率至少是最新技术的2.5至4.1倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号