Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

Liang Yun; Lu Liqiang; Xiao Qingcheng; Yan Shengen

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

【24h】

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

机译：评估FPGA上卷积神经网络的快速算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In recent years, convolutional neural networks (CNNs) have become widely adopted for computer vision tasks. Field-programmable gate arrays (FPGAs) have been adequately explored as a promising hardware accelerator for CNNs due to its high performance, energy efficiency, and reconfigurability. However, prior FPGA solutions based on the conventional convolutional algorithm is often bounded by the computational capability of FPGAs (e.g., the number of DSPs). To address this problem, the feature maps are transformed to a special domain using fast algorithms to reduce the arithmetic complexity. Winograd and fast Fourier transformation (FFT), as fast algorithm representatives, first transform input data and filter to Winograd or frequency domain, then perform element-wise multiplication, and apply inverse transformation to get the final output. In this paper, we propose a novel architecture for implementing fast algorithms on FPGAs. Our design employs line buffer structure to effectively reuse the feature map data among different tiles. We also effectively pipeline the Winograd/FFT processing element (PE) engine and initiate multiple PEs through parallelization. Meanwhile, there exists a complex design space to explore. We propose an analytical model to predict the resource usage and the performance. Then, we use the model to guide a fast design space exploration. Experiments using the state-of-the-art CNNs demonstrate the best performance and energy efficiency on FPGAs. We achieve 854.6 and 2479.6 GOP/s for AlexNet and VGG16 on Xilinx ZCU102 platform using Winograd. We achieve 130.4 GOP/s for Resnet using Winograd and 201.1 GOP/s for YOLO using FFT on Xilinx ZC706 platform.

机译：近年来，卷积神经网络（CNNS）已广泛采用计算机视觉任务。现场可编程门阵列（FPGA）由于其高性能，能源效率和可重新配置性而被充分探索为CNNS的有希望的硬件加速器。然而，基于传统卷积算法的先前FPGA溶液通常通过FPGA的计算能力（例如，DSP的数量）界定。为了解决这个问题，使用快速算法将特征映射转换为特殊域，以降低算术复杂性。 WinoGrad和快速傅里叶变换（FFT），作为快速算法代表，首先将输入数据和过滤到WinoGrad或频域，然后执行元素 - WISE乘法，并应用逆变换以获得最终输出。在本文中，我们提出了一种在FPGA上实现快速算法的新型架构。我们的设计采用行缓冲区结构，以有效地重用不同的瓷砖之间的特征映射数据。我们还有效地将WinoGrad / FFT处理元件（PE）发动机管道并通过并行化启动多个PE。同时，存在复杂的设计空间来探索。我们提出了一个分析模型来预测资源使用和性能。然后，我们使用模型来指导快速设计空间探索。使用最先进的CNNS的实验证明了FPGA上的最佳性能和能效。使用WinoGrad，我们在Xilinx ZCU102平台上实现854.6和2479.6 GOP / s。使用Xilinx ZC706平台上的FFT实现了使用Winograd和201.1 Gop / S的Reset达到了130.4 Gop / s。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2020年第4期|857-870|共14页
作者
Liang Yun; Lu Liqiang; Xiao Qingcheng; Yan Shengen;
展开▼
作者单位

Peking Univ Sch EECS Beijing 100871 Peoples R China|Peng Cheng Lab Shenzhen 518055 Peoples R China;

Peking Univ Ctr Energy Efficient Comp & Applicat Beijing 100871 Peoples R China;

Peking Univ Ctr Energy Efficient Comp & Applicat Beijing 100871 Peoples R China;

SenseTime Algorithm Platform Dept Hong Kong Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Field programmable gate arrays; Convolution; Space exploration; Prediction algorithms; Transforms; Analytical models; Convolutional neural networks; Convolutional neural network (CNN); fast algorithm; fast Fourier transformation (FFT); field-programmable gate array (FPGA); Winograd;

机译：现场可编程门阵列;卷积;空间探索;预测算法;转换;分析模型;卷积神经网络;卷积神经网络（CNN）;快速算法;快速傅里叶变换（FFT）;现场可编程门阵容（FPGA）;WINOGRAD;

相似文献

外文文献
中文文献
专利

1. A Survey of Algorithmic and Hardware Optimization Techniques for Vision Convolutional Neural Networks on FPGAs [J] . Sateesan Arish, Sinha Sharad, Smitha K. G., Neural processing letters . 2021,第3期

机译：对FPGA的视觉卷积神经网络算法和硬件优化技术调查
2. A fast and scalable architecture to run convolutional neural networks in low density FPGAs [J] . Vestias Mario P., Duarte Rui P., de Sousa Jose T., Microprocessors and microsystems . 2020,第Sepa期

机译：一种快速且可扩展的架构，可在低密度FPGA中运行卷积神经网络
3. FFConv: An FPGA-based Accelerator for Fast Convolution Layers in Convolutional Neural Networks [J] . AFZAL AHMAD, MUHAMMAD ADEEL PASHA ACM Transactions on Embedded Computing Systems . 2020,第2期

机译：FFCONV：卷积神经网络中的快速卷积层的基于FPGA的加速器
4. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs [C] . Liqiang Lu, Yun Liang, Qingcheng Xiao, IEEE Annual International Symposium on Field-Programmable Custom Computing Machines . 2017

机译：在FPGA上评估卷积神经网络的快速算法
5. Caffeinated FPGAs: FPGA Framework for Training and Inference of Convolutional Neural Networks With Reduced Precision Floating-Point Arithmetic [D] . DiCecco, Roberto. 2018

机译：含咖啡因的FPGA：用于训练和推理卷积神经网络的FPGA框架，具有降低的精度浮点算法
6. Deep convolutional neural networks: Outperforming established algorithms in the evaluation of industrial optical coherence tomography (OCT) images of pharmaceutical coatings [O] . Matthias Wolfgang, Michael Weißensteiner, Phillip Clarke, 2020

机译：深度卷积神经网络：在工业光学相干断层扫描（OCT）图像评估中表现优于良好的算法（OCT）的药物涂料图像
7. Fast Algorithms for Convolutional Neural Networks [O] . Lavin, Andrew, Gray, Scott 2015

机译：卷积神经网络的快速算法

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅