首页> 外文会议>International Conference on Reconfigurable Computing and FPGAs >A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC
【24h】

A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC

机译:在中程全可编程SoC上实现CNN加速的高效运行时可重配置IP

获取原文

摘要

Convolutional Neural Networks (CNNs) are a nature-inspired model, extensively employed in a broad range of applications in computer vision, machine learning and pattern recognition. The CNN algorithm requires execution of multiple layers, commonly called convolution layers, that involve application of 2D convolution filters of different sizes over a set of input image features. Such a computation kernel is intrinsically parallel, thus significantly benefits from acceleration on parallel hardware. In this work, we propose an accelerator architecture, suitable to be implemented on mid-to high-range FPGA devices, that can be re-configured at runtime to adapt to different filter sizes in different convolution layers. We present an accelerator configuration, mapped on a Xilinx Zynq XC-Z7045 device, that achieves up to 120 GMAC/s (16 bit precision) when executing 5×5 filters and up to 129 GMAC/s when executing 3×3 filters, consuming less than 10W of power, reaching more than 97% DSP resource utilizazion at 150MHz operating frequency and requiring only 16B/cycle I/O bandwidth.
机译:卷积神经网络(CNN)是受自然启发的模型,广泛应用于计算机视觉,机器学习和模式识别的广泛应用中。 CNN算法需要执行多个层(通常称为卷积层),这涉及在一组输入图像特征上应用大小不同的2D卷积滤波器。这样的计算内核本质上是并行的,因此可以从并行硬件上的加速中受益匪浅。在这项工作中,我们提出了一种加速器体系结构,适用于中高端FPGA设备,可以在运行时对其进行重新配置,以适应不同卷积层中不同的滤波器大小。我们提供了映射到Xilinx Zynq XC-Z7045器件上的加速器配置,执行5×5过滤器时可达到120 GMAC / s(16位精度),而执行3×3过滤器时可达到129 GMAC / s。不到10W的功率,在150MHz的工作频率下,DSP资源利用率达到97%以上,并且仅需要16B /周期I / O带宽。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号