MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip

Lei Gong; Chao Wang; Xi Li; Huaping Chen; Xuehai Zhou

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip

【24h】

MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip

机译：MALOC：用于卷积神经网络的全流水线FPGA加速器，所有层都映射在芯片上

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, field-programmable gate arrays (FPGAs) have been widely used in the implementations of hardware accelerator for convolutional neural networks (CNNs). However, most of these existing accelerators are designed in the same idea as their ASIC counterparts, in which all operations from different layers are mapped to the same hardware units and working in a multiplexed way. This manner does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation. In this paper, we propose a new architecture for FPGA-based CNN accelerator that maps all the layers to their own on-chip units and working concurrently as a pipeline. A comprehensive mapping and optimizing methodology based on establishing roofline model oriented optimization model is proposed, which can achieve maximum resource utilization as well as optimal computational efficiency. Besides, to ease the programming burden, we propose a design framework which can provide a one-stop function for developers to generate the accelerator with our optimizing methodology. We evaluate our proposal by implementing different modern CNN models on Xilinx Zynq-7020 and Virtex-7 690t FPGA platforms. Experimental results show that our implementations can achieve a peak performance of 910.2 GOPS on Virtex-7 690t, and 36.36 GOP/s/W energy efficiency on Zynq-7020, which are superior to the previous approaches.

机译：最近，现场可编程门阵列（FPGA）已被广泛用于卷积神经网络（CNN）的硬件加速器的实现中。但是，大多数这些现有加速器的设计原理与ASIC加速器相同，其中来自不同层的所有操作都映射到相同的硬件单元并以多路复用的方式工作。这种方式没有充分利用FPGA的可重新配置性和可定制性，从而导致一定程度的计算效率下降。在本文中，我们为基于FPGA的CNN加速器提出了一种新架构，该架构将所有层映射到它们自己的片上单元并作为管道并行工作。提出了一种基于建立屋顶模型的优化模型的综合映射和优化方法，该方法可以实现最大的资源利用率和最优的计算效率。此外，为减轻编程负担，我们提出了一个设计框架，该框架可为开发人员提供一站式功能，以我们的优化方法生成加速器。我们通过在Xilinx Zynq-7020和Virtex-7 690t FPGA平台上实施不同的现代CNN模型来评估我们的建议。实验结果表明，我们的实现在Virtex-7 690t上可以达到910.2 GOPS的峰值性能，在Zynq-7020上可以达到36.36 GOP / s / W的能量效率，这优于以前的方法。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2018年第11期|2601-2612|共12页
作者
Lei Gong; Chao Wang; Xi Li; Huaping Chen; Xuehai Zhou;
展开▼
作者单位

School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;

School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;

School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;

School of Software Engineering, University of Science and Technology of China, Hefei, China;

School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Field programmable gate arrays; Hardware; Computational modeling; System-on-chip; Computer architecture; Pipelines; Optimization;

机译：现场可编程门阵列;硬件;计算建模;片上系统;计算机体系结构;管道;优化;

相似文献

外文文献
专利

1. FFConv: An FPGA-based Accelerator for Fast Convolution Layers in Convolutional Neural Networks [J] . AFZAL AHMAD, MUHAMMAD ADEEL PASHA ACM Transactions on Embedded Computing Systems . 2020,第2期

机译：FFCONV：卷积神经网络中的快速卷积层的基于FPGA的加速器
2. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [J] . Shimoda Masayuki, Sada Youki, Nakahara Hiroki Journal of signal processing systems for signal, image, and video technology . 2021,第5期

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的百帘
3. A survey of FPGA-based accelerators for convolutional neural networks [J] . Neural computing & applications . 2020,第4期

机译：基于FPGA的卷积神经网络的加速器调查
4. A Novel Convolutional Neural Network Accelerator That Enables Fully-Pipelined Execution of Layers [C] . Donghyun Kang, Jintaek Kang, Hyungdal Kwon, International conference on computer design . 2019

机译：一种新颖的卷积神经网络加速器，可实现层的全流水线执行
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. Time-Frequency Distribution Map-Based Convolutional Neural Network (CNN) Model for Underwater Pipeline Leakage Detection Using Acoustic Signals [O] . Yingchun Xie, Yucheng Xiao, Xuyan Liu, 2020

机译：基于时频分布地图的基于地图的卷积神经网络（CNN）模型用于使用声信号进行水下管道泄漏检测
7. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [O] . Masayuki Shimoda, Youki Sada, Hiroki Nakahara 2021

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的平铺

MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip

摘要

著录项

相似文献

相关主题

期刊订阅