首页> 外文会议>International Conference on Application-specific Systems, Architectures and Processors >WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

【24h】

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

机译：Winocnn：内核共享Winograd Systolic阵列，用于FPGA上的高效卷积神经网络加速

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The combination of Winograd’s algorithm and systolic array architecture has demonstrated the capability of improving DSP efficiency in accelerating convolutional neural networks (CNNs) on FPGA platforms. However, handling arbitrary convolution kernel sizes in FPGA-based Winograd processing elements and supporting efficient data access remain underexplored. In this work, we are the first to propose an optimized Winograd processing element (WinoPE), which can naturally support multiple convolution kernel sizes with the same amount of computing resources and maintains high runtime DSP efficiency. Using the proposed WinoPE, we construct a highly efficient systolic array accelerator, termed WinoCNN. We also propose a dedicated memory subsystem to optimize the data access. Based on the accelerator architecture, we build accurate resource and performance modeling to explore optimal accelerator configurations under different resource constraints. We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency. Our implementation achieves DSP efficiency up to 1.33 GOPS/DSP and throughput up to 3.1 TOPS with the Xilinx ZCU102 FPGA. These are 29.1% and 20.0% better than the best solutions reported previously, respectively.

机译：WinoGrad算法和收缩系统阵列架构的组合已经证明了提高在FPGA平台上加速卷积神经网络（CNNS）的DSP效率的能力。但是，处理基于FPGA的WinoGrad处理元件中的任意卷积内核大小并支持高效的数据访问仍未实现了曝光率。在这项工作中，我们是第一个提出优化的Winograd处理元素（Winope）的旨在通过相同数量的计算资源支持多个卷积内核大小，并保持高运行时DSP效率。使用所提出的Winope，我们构建了一个高效的Systolic阵列加速器，称为WinoCnn。我们还提出了一个专用的内存子系统来优化数据访问。基于加速器架构，我们建立准确的资源和性能建模，以探索不同资源约束下的最佳加速器配置。我们在多个FPGA上实施我们提出的加速器，这在吞吐量和DSP效率方面优于最先进的设计。我们的实施实现了DSP效率，高达1.33 GOP / DSP和Xilinx ZCU102 FPGA的吞吐量高达3.1顶部。这些比以前报告的最佳解决方案更好地为29.1％和20.0％。

著录项

来源
《International Conference on Application-specific Systems, Architectures and Processors 》|2021年|258-265|共8页
会议地点
作者
Xinheng Liu; Yao Chen; Cong Hao; Ashutosh Dhar; Deming Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Runtime; Program processors; Convolution; Memory management; Systems architecture; Throughput; Arrays;

机译：运行时;程序处理器;卷积;内存管理;系统架构;吞吐量;阵列;

相似文献

外文文献
中文文献
专利

1. Exploring Efficient Acceleration Architecture for Winograd-Transformed Transposed Convolution of GANs on FPGAs [J] . Progress in Artificial Intelligence . 2020 ,第2期

机译：在FPGA上探索Winograd转换转换转换卷积的高效加速架构
2. WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm [J] . Wang Xuan, Wang Chao, Cao Jing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020 ,第11期

机译：WINONN：使用稀疏Winograd算法优化基于FPGA的卷积神经网络加速器
3. An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs [J] . Zhu Chaoyang, Huang Kejie, Yang Shuyuan, IEEE transactions on very large scale integration (VLSI) systems . 2020 ,第9期

机译：FPGA上结构化稀疏卷积神经网络的有效硬件加速器
4. SpWA: An Efficient Sparse Winograd Convolutional Neural Networks Accelerator on FPGAs [C] . Liqiang Lu, Yun Liang 2018 55th ACM/ESDA/IEEE Design Automation Conference . 2018

机译：SpWA：FPGA上的高效稀疏Winograd卷积神经网络加速器
5. Architecture and Automation for Efficient Convolutional Neural Network Acceleration on Field Programmable Gate Arrays [D] . Hall, Mathew Kent. 2020

机译：现场可编程门阵列高效卷积神经网络加速的体系结构和自动化
6. FPGA Implementation for Odor Identification with Depthwise Separable Convolutional Neural Network [O] . Zhuofeng Mo, Dehan Luo, Tengteng Wen, 2021

机译：FPGA实现气味识别与深度可分离的卷积神经网络
7. Sparse Winograd Convolutional Neural Networks on Small-scale Systolic Arrays [O] . Feng Shi, Haochen Li, Yuhe Gao, 2019

机译：小型收缩阵列上稀疏的Winograd卷积神经网络

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

摘要

著录项

相似文献

相关主题

期刊订阅