首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs
【24h】

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

机译:评估FPGA上卷积神经网络的快速算法

获取原文
获取原文并翻译 | 示例

摘要

In recent years, convolutional neural networks (CNNs) have become widely adopted for computer vision tasks. Field-programmable gate arrays (FPGAs) have been adequately explored as a promising hardware accelerator for CNNs due to its high performance, energy efficiency, and reconfigurability. However, prior FPGA solutions based on the conventional convolutional algorithm is often bounded by the computational capability of FPGAs (e.g., the number of DSPs). To address this problem, the feature maps are transformed to a special domain using fast algorithms to reduce the arithmetic complexity. Winograd and fast Fourier transformation (FFT), as fast algorithm representatives, first transform input data and filter to Winograd or frequency domain, then perform element-wise multiplication, and apply inverse transformation to get the final output. In this paper, we propose a novel architecture for implementing fast algorithms on FPGAs. Our design employs line buffer structure to effectively reuse the feature map data among different tiles. We also effectively pipeline the Winograd/FFT processing element (PE) engine and initiate multiple PEs through parallelization. Meanwhile, there exists a complex design space to explore. We propose an analytical model to predict the resource usage and the performance. Then, we use the model to guide a fast design space exploration. Experiments using the state-of-the-art CNNs demonstrate the best performance and energy efficiency on FPGAs. We achieve 854.6 and 2479.6 GOP/s for AlexNet and VGG16 on Xilinx ZCU102 platform using Winograd. We achieve 130.4 GOP/s for Resnet using Winograd and 201.1 GOP/s for YOLO using FFT on Xilinx ZC706 platform.
机译:近年来,卷积神经网络(CNNS)已广泛采用计算机视觉任务。现场可编程门阵列(FPGA)由于其高性能,能源效率和可重新配置性而被充分探索为CNNS的有希望的硬件加速器。然而,基于传统卷积算法的先前FPGA溶液通常通过FPGA的计算能力(例如,DSP的数量)界定。为了解决这个问题,使用快速算法将特征映射转换为特殊域,以降低算术复杂性。 WinoGrad和快速傅里叶变换(FFT),作为快速算法代表,首先将输入数据和过滤到WinoGrad或频域,然后执行元素 - WISE乘法,并应用逆变换以获得最终输出。在本文中,我们提出了一种在FPGA上实现快速算法的新型架构。我们的设计采用行缓冲区结构,以有效地重用不同的瓷砖之间的特征映射数据。我们还有效地将WinoGrad / FFT处理元件(PE)发动机管道并通过并行化启动多个PE。同时,存在复杂的设计空间来探索。我们提出了一个分析模型来预测资源使用和性能。然后,我们使用模型来指导快速设计空间探索。使用最先进的CNNS的实验证明了FPGA上的最佳性能和能效。使用WinoGrad,我们在Xilinx ZCU102平台上实现854.6和2479.6 GOP / s。使用Xilinx ZC706平台上的FFT实现了使用Winograd和201.1 Gop / S的Reset达到了130.4 Gop / s。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号