首页> 外文会议>IEEE Annual International Symposium on Field-Programmable Custom Computing Machines >An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

【24h】

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

机译：FPGA上稀疏卷积神经网络的高效硬件加速器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep convolutional neural networks (CNN) have achieved remarkable performance with the cost of huge computation. As the CNN model becomes more complex and deeper, compressing CNN to sparse by pruning the redundant connection in networks has emerged as an attractive approach to reduce the amount of computation and memory requirement. In recent years, FPGAs have been demonstrated to be an effective hardware platform to accelerate CNN inference. However, most existing FPGA architectures focus on dense CNN models. The architecture designed for dense CNN models are inefficient when executing sparse models as most of the arithmetic operations involve addition and multiplication with zero operands. On the other hand, recent sparse FPGA accelerators only focus on FC layers. In this work, we aim to develop an FPGA accelerator for sparse CNNs. To efficiently deal with the irregular connection in the sparse convolutional layer, we propose a weight-oriented dataflow that processes each weight individually. Then we design an FPGA architecture which can handle input-weight connection and weight-output connection efficiently. For input-weight connection, we design a tile look-up table to eliminate the runtime indexing match of compressed weights. Moreover, we develop a weight layout to enable high on-chip memory access. To cooperate with the weight layout, a channel multiplexer is inserted to locate the address which can ensure no data access conflict. Experiments demonstrate that our accelerator can achieve 223.4-309.0 GOP/s for the modern CNNs on Xilinx ZCU102, which provides a 3.6x-12.9x speedup over previous dense CNN FPGA accelerators.

机译：深度卷积神经网络（CNN）以巨大的计算成本获得了非凡的性能。随着CNN模型变得越来越复杂和深入，通过修剪网络中的冗余连接来压缩CNN以使其稀疏已成为一种吸引人的方法，可以减少计算量和内存需求。近年来，FPGA被证明是加速CNN推理的有效硬件平台。但是，大多数现有的FPGA体系结构都集中在密集的CNN模型上。当执行稀疏模型时，为密集CNN模型设计的体系结构效率低下，因为大多数算术运算都涉及零操作数的加法和乘法。另一方面，最近的稀疏FPGA加速器仅关注FC层。在这项工作中，我们旨在开发一种用于稀疏CNN的FPGA加速器。为了有效处理稀疏卷积层中的不规则连接，我们提出了一种面向权重的数据流，该数据流分别处理每个权重。然后，我们设计了一种FPGA架构，该架构可以有效处理输入-重量连接和重量-输出连接。对于输入权重连接，我们设计了一个图块查找表，以消除压缩权重的运行时索引匹配。此外，我们开发了一种权重布局以实现高片上存储器访问。为了与权重布局配合使用，插入了一个通道多路复用器来定位地址，以确保没有数据访问冲突。实验表明，对于Xilinx ZCU102上的现代CNN，我们的加速器可以达到223.4-309.0 GOP / s，与先前的密集型CNN FPGA加速器相比，其加速比达到3.6x-12.9x。

著录项

来源
《IEEE Annual International Symposium on Field-Programmable Custom Computing Machines 》|2019年|17-25|共9页
会议地点
作者
Liqiang Lu; Jiaming Xie; Ruirui Huang; Jiansong Zhang; Wei Lin; Yun Liang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Field programmable gate arrays; Kernel; Convolution; Computational modeling; Table lookup; Layout;

机译：现场可编程门阵列;内核;卷积;计算建模;查表;布局;

相似文献

外文文献
中文文献
专利

1. An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs [J] . Zhu Chaoyang, Huang Kejie, Yang Shuyuan, IEEE transactions on very large scale integration (VLSI) systems . 2020 ,第9期

机译：FPGA上结构化稀疏卷积神经网络的有效硬件加速器
2. RSNN: A Software/Hardware Co-Optimized Framework for Sparse Convolutional Neural Networks on FPGAs [J] . Weijie You, Chang Wu Quality Control, Transactions . 2021 ,第1期

机译：RSNN：FPGA上的稀疏卷积神经网络的软件/硬件共同优化框架
3. WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm [J] . Wang Xuan, Wang Chao, Cao Jing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020 ,第11期

机译：WINONN：使用稀疏Winograd算法优化基于FPGA的卷积神经网络加速器
4. An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs [C] . Liqiang Lu, Jiaming Xie, Ruirui Huang, IEEE Annual International Symposium on Field-Programmable Custom Computing Machines . 2019

机译：用于FPGA的稀疏卷积神经网络有效的硬件加速器
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. FPGA Implementation for Odor Identification with Depthwise Separable Convolutional Neural Network [O] . Zhuofeng Mo, Dehan Luo, Tengteng Wen, 2021

机译：FPGA实现气味识别与深度可分离的卷积神经网络
7. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [O] . Masayuki Shimoda, Youki Sada, Hiroki Nakahara 2021

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的平铺

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

摘要

著录项

相似文献

相关主题

期刊订阅