首页> 外文期刊>Circuits and Systems II: Express Briefs, IEEE Transactions on >A Resource-Limited Hardware Accelerator for Convolutional Neural Networks in Embedded Vision Applications
【24h】

A Resource-Limited Hardware Accelerator for Convolutional Neural Networks in Embedded Vision Applications

机译:嵌入式视觉应用中用于卷积神经网络的资源受限的硬件加速器

获取原文
获取原文并翻译 | 示例
           

摘要

In this brief, we introduce an architecture for accelerating convolution stages in convolutional neural networks (CNNs) implemented in embedded vision systems. The purpose of the architecture is to exploit the inherent parallelism in CNNs to reduce the required bandwidth, resource usage, and power consumption of highly computationally complex convolution operations as required by real-time embedded applications. We also implement the proposed architecture using fixed-point arithmetic on a ZC706 evaluation board that features a Xilinx Zynq-7000 system on-chip, where the embedded ARM processor with high clocking speed is used as the main controller to increase the flexibility and speed. The proposed architecture runs under a frequency of 150 MHz, which leads to 19.2 Giga multiply accumulation operations per second while consuming less than 10 W in power. This is done using only 391 DSP48 modules, which shows significant utilization improvement compared to the state-of-the-art architectures.
机译:在本文中,我们介绍了一种用于加速嵌入式视觉系统中实现的卷积神经网络(CNN)中的卷积阶段的体系结构。该体系结构的目的是利用CNN中固有的并行性,以减少实时嵌入式应用程序所需的高度计算复杂的卷积操作所需的带宽,资源使用和功耗。我们还在具有Xilinx Zynq-7000系统片上功能的ZC706评估板上使用定点算法实现了建议的体系结构,其中以高时钟速度的嵌入式ARM处理器为主要控制器,以提高灵活性和速度。拟议的架构在150 MHz的频率下运行,这导致每秒19.2 Giga乘法累加操作,而功耗却不到10W。仅使用391个DSP48模块即可完成此操作,与最新架构相比,它显示出显着的利用率提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号