WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

Wang Xuan; Wang Chao; Cao Jing; Gong Lei; Zhou Xuehai

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

【24h】

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

机译：WINONN：使用稀疏Winograd算法优化基于FPGA的卷积神经网络加速器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In recent years, a variety of accelerators on FPGAs have been proposed to speed up the convolutional neural network (CNN) in many domain-specific application fields. Besides, some optimization algorithms, such as fast algorithms and network sparsity, have greatly reduced the theoretical computational workload of CNN inference. There are currently a few accelerators on FPGAs that support both the fast Winograd algorithm (WinoA) and network sparsity to minimize the amount of computation. However, on the one hand, these architectures feed data into processing elements (PEs) in units of blocks, some boundary losses caused by sparse irregularities cannot be avoided. On the other hand, these works have not discussed the design space exploration under the sparse condition. In this article, we propose a novel accelerator called WINONN. We fully discuss the challenges faced by supportingWinoA, weight sparsity, and activation sparsity simultaneously. To minimize the online encoding over-head caused by activation sparsity, an efficient encoding format called multibit mask (MBM) is proposed. To handle the irregularities of sparse data, we proposed a novel Scatter-Compute-Gather method in hardware design, combined with a freely sliding buffer to achieve fine-grained data loading to minimize the boundary waste. Finally, we combine a theoretical analysis and experimental method to explore the design space, allowing WINONN to get the best performance on a specific FPGA. Our high scalability design enables us to deploy sparse Winograd accelerators on very small embedded FPGAs, which is not supported in previous works. The experimental results on VGG16 show that we achieve the highest digital signal processing unit (DSP) efficiency and highest energy efficiency compared with the state-of-the-art sparse architectures.

机译：近年来，已经提出了在许多域特定应用领域中加快卷积神经网络（CNN）的各种加速器。此外，一些优化算法，例如快速算法和网络稀疏性，大大降低了CNN推断的理论计算工作量。目前在FPGA上有一些加速器，支持快速Winograd算法（WinoA）和网络稀疏，以最大限度地减少计算量。然而，一方面，这些架构以块为单位的处理元件（PE）馈送数据，不能避免由稀疏不规则引起的一些边界损耗。另一方面，这些作品尚未在稀疏状态下讨论设计空间探索。在本文中，我们提出了一个名为Winonn的新型加速器。我们完全讨论了支持沃基诺，重量稀疏性和激活稀疏面临的挑战。为了最小化由激活稀疏性引起的在线编码过度，提出了一种有效的编码格式，称为多点掩码（MBM）。为了处理稀疏数据的不规则性，我们提出了一种在硬件设计中的新型散射计算聚集方法，结合自由滑动缓冲器来实现细粒度的数据加载以最小化边界浪费。最后，我们结合了理论分析和实验方法来探索设计空间，允许Winonn在特定FPGA上获得最佳性能。我们的高可扩展性设计使我们能够在非常小的嵌入式FPGA上部署稀疏的Winograd加速器，这在以前的作品中不受支持。与最先进的稀疏架构相比，VGG16对VGG16的实验结果表明，我们实现了最高数字信号处理单元（DSP）效率和最高能量效率。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2020年第11期|4290-4302|共13页
作者
Wang Xuan; Wang Chao; Cao Jing; Gong Lei; Zhou Xuehai;
展开▼
作者单位

Univ Sci & Technol China Sch Comp Sci & Technol Hefei 230027 Peoples R China;

Univ Sci & Technol China Sch Comp Sci & Technol Hefei 230027 Peoples R China;

Univ Sci & Technol China Sch Comp Sci & Technol Hefei 230027 Peoples R China;

Univ Sci & Technol China Sch Comp Sci & Technol Hefei 230027 Peoples R China;

Univ Sci & Technol China Sch Comp Sci & Technol Hefei 230027 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
FPGA; sparse neural networks; Winograd algorithm (WinoA);

机译：FPGA;稀疏神经网络;Winograd算法（WinoA）;

相似文献

外文文献
中文文献
专利

1. WRA: A 2.2-to-6.3 TOPS Highly Unified Dynamically Reconfigurable Accelerator Using a Novel Winograd Decomposition Algorithm for Convolutional Neural Networks [J] . Chen Yang, Yizhou Wang, Xiaoli Wang, Circuits and Systems I: Regular Papers, IEEE Transactions on . 2019,第9期

机译：WRA：使用新型Winograd分解算法的卷积神经网络的2.2至6.3 TOPS高度统一的动态可重新配置加速器
2. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [J] . Shimoda Masayuki, Sada Youki, Nakahara Hiroki Journal of signal processing systems for signal, image, and video technology . 2021,第5期

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的百帘
3. A survey of FPGA-based accelerators for convolutional neural networks [J] . Neural computing & applications . 2020,第4期

机译：基于FPGA的卷积神经网络的加速器调查
4. Work-in-Progress: WinoNN: Optimising FPGA-based Neural Network Accelerators using Fast Winograd Algorithm [C] . Xuan Wang, Chao Wang, Xuehai Zhou International Conference on Hardware/Software Codesign and System Synthesis . 2018

机译：正在进行的工作：WinoNN：使用快速Winograd算法优化基于FPGA的神经网络加速器
5. Artificial Neural Network Optimizations for FPGA-Based Accelerators: Exploration of Low Numeric Precision, Sparsity, and Evolutionary Algorithms [D] . Colangelo, Philip . 2020

机译：基于FPGA的促进者的人工神经网络优化：低数字精度，稀疏性和进化算法的探索
6. Optimal Diagnosis of COVID-19 Based on Convolutional Neural Network and Red Fox Optimization Algorithm [O] . Ehsan Khorami, Fatemeh Mahdi Babaei, Aidin Azadeh 2021

机译：基于卷积神经网络和红狐优化算法的Covid-19最佳诊断
7. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [O] . Masayuki Shimoda, Youki Sada, Hiroki Nakahara 2021

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的平铺

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅