首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Stride 2 1-D, 2-D, and 3-D Winograd for Convolutional Neural Networks
【24h】

Stride 2 1-D, 2-D, and 3-D Winograd for Convolutional Neural Networks

机译:用于卷积神经网络的步幅2 1-D,2-D和3-D WinoGrad

获取原文
获取原文并翻译 | 示例
           

摘要

Convolutional neural networks (CNNs) have been widely adopted for computer vision applications. CNNs require many multiplications, making their use expensive in terms of both computational complexity and hardware. An effective method to mitigate the number of required multiplications is via the Winograd algorithm. Previous implementations of CNNs based on Winograd use the 2-D algorithm F(2 x 2, 3 x 3), which reduces computational complexity by a factor of 2.25 over regular convolution. However, current Winograd implementations only apply when using a stride (shift displacement of a kernel over an input) of 1. In this article, we presented a novel method to apply the Winograd algorithm to a stride of 2. This method is valid for one, two, or three dimensions. We also introduced new Winograd versions compatible with a kernel of size 3, 5, and 7. The algorithms were successfully implemented on an NVIDIA K20c GPU. Compared to regular convolutions, the implementations for stride 2 are 1.44 times faster for a 3 x 3 kernel, 2.04x faster for a 5 x 5 kernel, 2.42x faster for a 7 x 7 kernel, and 1.73x faster for a 3 x 3 x 3 kernel. Additionally, a CNN accelerator using a novel processing element (PE) performs two 2-D Winograd stride 1, or one 2-D Winograd stride 2, and operations per clock cycle was implemented on an Intel Arria-10 field-programmable gate array (FPGA). We accelerated the original and our proposed modified VGG-16 architectures and achieved digital signal processor (DSP) efficiencies of 1.22 giga operations per second (GOPS)/DSPs and 1.33 GOPS/DSPs, respectively.
机译:卷积神经网络(CNNS)已被广泛采用计算机视觉应用。 CNNS需要多种乘法,以计算复杂性和硬件而言昂贵。通过WinoGrad算法缓解所需乘法数的有效方法。基于WinoGrad的CNN的先前实现使用了2-D算法F(2 x 2,3 x 3),其在常规卷积中通过2.25的计算复杂度降低了2.25倍。但是,当前的Winograd实现仅适用于使用步幅(在输入上的内核的移动位置)。在本文中,我们提出了一种将WinoGrad算法应用于2.此方法对一个新方法进行了新颖的方法,两个或三个维度。我们还推出了与大小3,5和7的内核兼容的新的Winograd版本。该算法在NVIDIA K20C GPU上成功实现。与常规卷积相比,对于3 x 3内核,步幅2的实现速度快1.44倍,对于5 x 5内核,2.04倍,对于7 x 7内核,2.42倍,对于3 x 3,1.73x更快1.73x x 3内核。另外,使用新颖的处理元件(PE)的CNN加速器执行两个2-D WinoGrad阶段1,或者一个2-D WinoGrad步阶2,并且每个时钟周期的操作在Intel Arria-10现场可编程门阵列上实现( FPGA)。我们加速了原始的和我们提出的修改的VGG-16架构,并分别实现了每秒1.22千兆操作的数字信号处理器(DSP)效率(GOP)/ DSP和1.33 GOP / DSP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号