首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks
【24h】

Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

机译:UNI-OPU:基于FPGA的统一加速器,用于卷积和转置卷积网络

获取原文
获取原文并翻译 | 示例

摘要

In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.
机译:在本文中,我们设计了一个名为UNI-OPU的第一个完整软件/硬件堆栈,以实现不同类型的转换卷积(TCONV)网络和传统卷积(CONCH)网络的有效统一硬件加速度。具体地,提供软件编译器来改造各种TCONV,即基于零插入的TCONV(零TCONV),基于邻邻大小的TCONV(NN-TCONV)的计算,并将其图层变为相同的图案。编译器通过利用使用TCONV上采样的固定模式,消除了高达98.4%的TCONV中的操作; 2)用统一地址生成方案和数据流模式分解和重新制定TCONV并转换为流并行矢量乘法; 3)高效的调度和指令编译将网络映射到硬件处理器上。开发了一种基于指令的硬件加速处理器,以有效地加速我们的均匀计算模式,吞吐量高达2.35顶部的TCONV层,仅消耗2.89 W的动态功率。我们在来自不同应用领域的六个TConv网络组成的基准集中评估UNI-OPU。广泛的实验结果表明,与最先进的零TCONV加速器相比,UNI-OPU能够获得1.45倍至3.68倍的优越功率效率。在NN-TCONV网络上也实现了高加速性能,之前尚未探讨的加速度。总之,我们平均地观察到零TCONV和NN-TCONV网络的1.90倍和1.63倍的延迟减小,以及15.04倍和12.43倍的功率效率。据我们所知,我们的是第一个完全统一零TCONV,NN-TCONV和CONV层的计算过程的深入研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号