An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

Dinelli Gianmarco; Meoni Gabriele; Rapuano Emilio; Benelli Gionata; Fanucci Luca

首页> 外文期刊>International journal of reconfigurable computing >An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

【24h】

An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

机译：仅用于CNN的FPGA的硬件加速器，仅使用片上存储器：设计和基准与英特尔Movidius神经计算棒

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

During the last years, convolutional neural networks have been used for different applications, thanks to their potentiality to carry out tasks by using a reduced number of parameters when compared with other deep learning approaches. However, power consumption and memory footprint constraints, typical of on the edge and portable applications, usually collide with accuracy and latency requirements. For such reasons, commercial hardware accelerators have become popular, thanks to their architecture designed for the inference of general convolutional neural network models. Nevertheless, field-programmable gate arrays represent an interesting perspective since they offer the possibility to implement a hardware architecture tailored to a specific convolutional neural network model, with promising results in terms of latency and power consumption. In this article, we propose a full on-chip field-programmable gate array hardware accelerator for a separable convolutional neural network, which was designed for a keyword spotting application. We started from the model implemented in a previous work for the Intel Movidius Neural Compute Stick. For our goals, we appropriately quantized such a model through a bit-true simulation, and we realized a dedicated architecture exclusively using on-chip memories. A benchmark comparing the results on different field-programmable gate array families by Xilinx and Intel with the implementation on the Neural Compute Stick was realized. The analysis shows that better inference time and energy per inference results can be obtained with comparable accuracy at expenses of a higher design effort and development time through the FPGA solution.

机译：在过去几年中，由于与其他深度学习方法相比，通过使用减少的参数来执行任务的潜力，卷积神经网络已被用于不同的应用。但是，功耗和存储器占用限制，典型的边缘和便携式应用程序，通常与精度和延迟要求碰撞。出于这样的原因，由于其架构设计用于一般卷积神经网络模型的推断，商业硬件加速器变得流行。然而，现场可编程门阵列代表了一个有趣的视角，因为它们提供了实现对特定卷积神经网络模型量身定制的硬件体系结构的可能性，这是在延迟和功耗方面的有希望的结果。在本文中，我们提出了一个完整的片上现场可编程门阵列硬件加速器，用于可分离的卷积神经网络，专为关键字拍摄应用而设计。我们从在上一个工作中实现的模型开始为英特尔Movidius神经计算棒。为我们的目标，我们通过钻头真实仿真适当地量化了这样的模型，我们实现了专门使用片上存储器的专用架构。实现了Xilinx和Intel在神经计算棒上实现不同现场可编程门阵列系列结果的基准与Xilinx和Intel进行了比较。分析表明，通过FPGA解决方案更高的设计努力和开发时间的费用，可以以可比的精度获得每个推断结果的更好的推理时间和能量。

著录项

来源
《International journal of reconfigurable computing》 |2019年第1期|7218758.1-7218758.13|共13页
作者
Dinelli Gianmarco; Meoni Gabriele; Rapuano Emilio; Benelli Gionata; Fanucci Luca;
展开▼
作者单位

Univ Pisa Dept Informat Engn I-56122 Pisa Italy;

Univ Pisa Dept Informat Engn I-56122 Pisa Italy;

Univ Pisa Dept Informat Engn I-56122 Pisa Italy;

IngeniArs I-56121 Pisa Italy;

Univ Pisa Dept Informat Engn I-56122 Pisa Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick [J] . Gianmarco Dinelli, Gabriele Meoni, Emilio Rapuano, International journal of reconfigurable computing . 2019,第5aaPagea2期

机译：仅用于CNN的FPGA的硬件加速器，仅使用片上存储器：设计和基准与英特尔Movidius神经计算棒
2. Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL [J] . Li Luo, Yakun Wu, Fei Qiao, International journal of reconfigurable computing . 2018,第期

机译：基于OpenCL的异构计算框架下基于FPGA的卷积神经网络加速器设计
3. Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL [J] . Li Luo, Yakun Wu, Fei Qiao, International journal of reconfigurable computing . 2018,第1aaPagea1期

机译：基于OpenCL的异构计算框架下基于FPGA的卷积神经网络加速器设计
4. Design and Performance Comparison of CNN Accelerators Based on the Intel Movidius Myriad2 SoC and FPGA Embedded Prototype [C] . Angelos Kyriakos, Elissaios-Alexios Papatheofanous, Bezaitis Charalampos, International Conference on Control, Artificial Intelligence, Robotics Optimization . 2019

机译：基于Intel Movidius Myriad2 SoC和FPGA嵌入式原型的CNN加速器的设计和性能比较
5. Design of Hardware CNN Accelerators for Audio and Image Classification [D] . Gillela, Rohini Jayachandre. 2020

机译：音频和图像分类硬件CNN加速器的设计
6. NeuroSim Simulator for Compute-in-Memory Hardware Accelerator: Validation and Benchmark [O] . Anni Lu, Xiaochen Peng, Wantong Li, 2021

机译：用于计算内存硬件加速器的神经核心模拟器：验证和基准
7. An FPGA-Based Hardware Accelerator for CNNs Inference on Board Satellites: Benchmarking with Myriad 2-Based Solution for the CloudScout Case Study [O] . Emilio Rapuano, Gabriele Meoni, Tommaso Pacini, 2021

机译：基于FPGA的硬件加速器，用于卫星上的CNNS推理：与Myriad 2的基于Myriad 2的解决方案进行CloudScout案例研究

An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick

摘要

著录项

相似文献

相关主题

期刊订阅