首页> 外文会议>International Symposium on VLSI Design, Automation, and Test >End-to-end hardware accelerator for deep convolutional neural network
【24h】

End-to-end hardware accelerator for deep convolutional neural network

机译:深度卷积神经网络的端到端硬件加速器

获取原文

摘要

Deep convolutional neural networks (CNNs) have achieved state-of-the-art accuracy on recognition, detection, and other computer vision fields. A CNN hardware will enable mobile devices to meet real time demands. However, the design of CNN hardware faces challenges of high computational complexity and data bandwidth as well as huge divergence for different CNN network layers. In which, the throughput of the convolutional layer would be bounded by hardware resource and throughput of the fully connected layer would be bounded by available data bandwidth. Thus, a highly flexible design with efficient hardware is desired to meet these needs. This talk will present our end-to-end CNN accelerator with shared filter kernel for all layers and output view strategy for maximum data reuse. The whole CNN architecture is modelled with tile based design to optimize hardware resource and I/O data bandwidth for the desired CNN network under design constraints. The final design is generated based on desired resources and run time reconfigured by layer optimized parameters to achieve real time end-to-end CNN acceleration.
机译:深度卷积神经网络(CNN)在识别,检测和其他计算机视觉领域已经达到了最新的准确性。 CNN硬件将使移动设备能够满足实时需求。然而,CNN硬件的设计面临着高计算复杂度和数据带宽以及不同CNN网络层之间巨大差异的挑战。其中,卷积层的吞吐量将受到硬件资源的限制,而全连接层的吞吐量将受到可用数据带宽的限制。因此,需要具有高效硬件的高度灵活的设计来满足这些需求。本讲座将介绍我们的端到端CNN加速器,该共享器具有适用于所有层的共享过滤器内核以及用于最大程度地重复使用数据的输出视图策略。整个CNN架构均采用基于图块的设计进行建模,以在设计约束下针对所需的CNN网络优化硬件资源和I / O数据带宽。最终设计基于所需的资源生成,并且运行时间由层优化参数重新配置,以实现实时的端到端CNN加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号