A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC

机译：在中程全可编程SoC上实现CNN加速的高效运行时可重配置IP

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional Neural Networks (CNNs) are a nature-inspired model, extensively employed in a broad range of applications in computer vision, machine learning and pattern recognition. The CNN algorithm requires execution of multiple layers, commonly called convolution layers, that involve application of 2D convolution filters of different sizes over a set of input image features. Such a computation kernel is intrinsically parallel, thus significantly benefits from acceleration on parallel hardware. In this work, we propose an accelerator architecture, suitable to be implemented on mid-to high-range FPGA devices, that can be re-configured at runtime to adapt to different filter sizes in different convolution layers. We present an accelerator configuration, mapped on a Xilinx Zynq XC-Z7045 device, that achieves up to 120 GMAC/s (16 bit precision) when executing 5×5 filters and up to 129 GMAC/s when executing 3×3 filters, consuming less than 10W of power, reaching more than 97% DSP resource utilizazion at 150MHz operating frequency and requiring only 16B/cycle I/O bandwidth.

机译：卷积神经网络（CNN）是受自然启发的模型，广泛应用于计算机视觉，机器学习和模式识别的广泛应用中。 CNN算法需要执行多个层（通常称为卷积层），这涉及在一组输入图像特征上应用大小不同的2D卷积滤波器。这样的计算内核本质上是并行的，因此可以从并行硬件上的加速中受益匪浅。在这项工作中，我们提出了一种加速器体系结构，适用于中高端FPGA设备，可以在运行时对其进行重新配置，以适应不同卷积层中不同的滤波器大小。我们提供了映射到Xilinx Zynq XC-Z7045器件上的加速器配置，执行5×5过滤器时可达到120 GMAC / s（16位精度），而执行3×3过滤器时可达到129 GMAC / s。不到10W的功率，在150MHz的工作频率下，DSP资源利用率达到97％以上，并且仅需要16B /周期I / O带宽。

著录项

来源
《International Conference on Reconfigurable Computing and FPGAs》|2016年|1-8|共8页
会议地点
作者
Paolo Meloni; Gianfranco Deriu; Francesco Conti; Igor Loi; Luigi Raffo; Luca Benini;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Convolution; Ports (Computers); Field programmable gate arrays; Hardware; Computer architecture; IP networks; Acceleration;

机译：卷积;端口（计算机）;现场可编程门阵列;硬件;计算机体系结构; IP网络;加速;

相似文献

外文文献
中文文献
专利

1. NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs [J] . Meloni Paolo, Capotondi Alessandro, Deriu Gianfranco, ACM transactions on reconfigurable technology and systems . 2018,第3期

机译：NEURAghe：利用CPU-FPGA协同功能在Zynq SoC上实现高效灵活的CNN推理加速
2. Streamline Your Augmented-Reality System with an All-Programmable SoC [J] . NICK NI, ADAM TAYLOR Electronic Design . 2017,第12期

机译：通过全可编程SoC简化增强现实系统
3. All-Programmable SoC Solution [J] . Colin OFlynn Circuit cellar . 2014,第285期

机译：全可编程SoC解决方案
4. A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC [C] . Paolo Meloni, Gianfranco Deriu, Francesco Conti, International Conference on Reconfigurable Computing and FPGAs . 2016

机译：用于中档全编程SOC的CNN加速的高效运行时可重新配置IP
5. System-on-a-Chip (Soc)-Based Hardware Acceleration for Human Action Recognition with Core Components [D] . Safaei, Amin 2018

机译：基于片上系统（Soc）的硬件加速，用于使用核心组件进行人类动作识别
6. Acceleration of Image Segmentation Algorithm for (Breast) Mammogram Images Using High-Performance Reconfigurable Dataflow Computers [O] . Ivan L. Milankovic, Nikola V. Mijailovic, Nenad D. Filipovic, 2017

机译：使用高性能可重构数据流计算机加速（乳房）乳房X线图像的图像分割算法
7. Virtual Architectures for Partial Runtime Reconfigurable Systems. Application to Network on Chip based SoC Emulation [O] . Esteves Krasteva Yana, Torre Arnanz Eduardo de la, Riesgo Alcaide Teresa 2008

机译：部分运行时可重配置系统的虚拟体系结构。在基于片上网络的SoC仿真中的应用

A high-efficiency runtime reconfigurable IP for CNN acceleration on a mid-range all-programmable SoC

摘要

著录项

相似文献

相关主题

期刊订阅