首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm
【24h】

WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm

机译:WINONN:使用稀疏Winograd算法优化基于FPGA的卷积神经网络加速器

获取原文
获取原文并翻译 | 示例

摘要

In recent years, a variety of accelerators on FPGAs have been proposed to speed up the convolutional neural network (CNN) in many domain-specific application fields. Besides, some optimization algorithms, such as fast algorithms and network sparsity, have greatly reduced the theoretical computational workload of CNN inference. There are currently a few accelerators on FPGAs that support both the fast Winograd algorithm (WinoA) and network sparsity to minimize the amount of computation. However, on the one hand, these architectures feed data into processing elements (PEs) in units of blocks, some boundary losses caused by sparse irregularities cannot be avoided. On the other hand, these works have not discussed the design space exploration under the sparse condition. In this article, we propose a novel accelerator called WINONN. We fully discuss the challenges faced by supportingWinoA, weight sparsity, and activation sparsity simultaneously. To minimize the online encoding over-head caused by activation sparsity, an efficient encoding format called multibit mask (MBM) is proposed. To handle the irregularities of sparse data, we proposed a novel Scatter-Compute-Gather method in hardware design, combined with a freely sliding buffer to achieve fine-grained data loading to minimize the boundary waste. Finally, we combine a theoretical analysis and experimental method to explore the design space, allowing WINONN to get the best performance on a specific FPGA. Our high scalability design enables us to deploy sparse Winograd accelerators on very small embedded FPGAs, which is not supported in previous works. The experimental results on VGG16 show that we achieve the highest digital signal processing unit (DSP) efficiency and highest energy efficiency compared with the state-of-the-art sparse architectures.
机译:近年来,已经提出了在许多域特定应用领域中加快卷积神经网络(CNN)的各种加速器。此外,一些优化算法,例如快速算法和网络稀疏性,大大降低了CNN推断的理论计算工作量。目前在FPGA上有一些加速器,支持快速Winograd算法(WinoA)和网络稀疏,以最大限度地减少计算量。然而,一方面,这些架构以块为单位的处理元件(PE)馈送数据,不能避免由稀疏不规则引起的一些边界损耗。另一方面,这些作品尚未在稀疏状态下讨论设计空间探索。在本文中,我们提出了一个名为Winonn的新型加速器。我们完全讨论了支持沃基诺,重量稀疏性和激活稀疏面临的挑战。为了最小化由激活稀疏性引起的在线编码过度,提出了一种有效的编码格式,称为多点掩码(MBM)。为了处理稀疏数据的不规则性,我们提出了一种在硬件设计中的新型散射计算聚集方法,结合自由滑动缓冲器来实现细粒度的数据加载以最小化边界浪费。最后,我们结合了理论分析和实验方法来探索设计空间,允许Winonn在特定FPGA上获得最佳性能。我们的高可扩展性设计使我们能够在非常小的嵌入式FPGA上部署稀疏的Winograd加速器,这在以前的作品中不受支持。与最先进的稀疏架构相比,VGG16对VGG16的实验结果表明,我们实现了最高数字信号处理单元(DSP)效率和最高能量效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号