Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

Yu Yunxuan; Zhao Tiandong; Wang Mingyu; Wang Kun; He Lei

首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

【24h】

Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

机译：UNI-OPU：基于FPGA的统一加速器，用于卷积和转置卷积网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this article, we design the first full software/hardware stack, called Uni-OPU, for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain 1.45x to 3.68x superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe 1.90x and 1.63x latency reduction, as well as 15.04x and 12.43x higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.

机译：在本文中，我们设计了一个名为UNI-OPU的第一个完整软件/硬件堆栈，以实现不同类型的转换卷积（TCONV）网络和传统卷积（CONCH）网络的有效统一硬件加速度。具体地，提供软件编译器来改造各种TCONV，即基于零插入的TCONV（零TCONV），基于邻邻大小的TCONV（NN-TCONV）的计算，并将其图层变为相同的图案。编译器通过利用使用TCONV上采样的固定模式，消除了高达98.4％的TCONV中的操作; 2）用统一地址生成方案和数据流模式分解和重新制定TCONV并转换为流并行矢量乘法; 3）高效的调度和指令编译将网络映射到硬件处理器上。开发了一种基于指令的硬件加速处理器，以有效地加速我们的均匀计算模式，吞吐量高达2.35顶部的TCONV层，仅消耗2.89 W的动态功率。我们在来自不同应用领域的六个TConv网络组成的基准集中评估UNI-OPU。广泛的实验结果表明，与最先进的零TCONV加速器相比，UNI-OPU能够获得1.45倍至3.68倍的优越功率效率。在NN-TCONV网络上也实现了高加速性能，之前尚未探讨的加速度。总之，我们平均地观察到零TCONV和NN-TCONV网络的1.90倍和1.63倍的延迟减小，以及15.04倍和12.43倍的功率效率。据我们所知，我们的是第一个完全统一零TCONV，NN-TCONV和CONV层的计算过程的深入研究。

著录项

来源
《IEEE transactions on very large scale integration (VLSI) systems》 |2020年第7期|1545-1556|共12页
作者
Yu Yunxuan; Zhao Tiandong; Wang Mingyu; Wang Kun; He Lei;
展开▼
作者单位

Univ Calif Los Angeles Dept Elect & Comp Engn Los Angeles CA 90095 USA;

Univ Calif Los Angeles Dept Elect & Comp Engn Los Angeles CA 90095 USA;

Univ Calif Los Angeles Dept Elect & Comp Engn Los Angeles CA 90095 USA;

Univ Calif Los Angeles Dept Elect & Comp Engn Los Angeles CA 90095 USA;

Univ Calif Los Angeles Dept Elect & Comp Engn Los Angeles CA 90095 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Convolutional neural network (CNN) overlay processor; FPGA acceleration; hardware-software codesign;

机译：卷积神经网络（CNN）覆盖处理器;FPGA加速度;硬件 - 软件代码;

相似文献

外文文献
中文文献
专利

1. FFConv: An FPGA-based Accelerator for Fast Convolution Layers in Convolutional Neural Networks [J] . AFZAL AHMAD, MUHAMMAD ADEEL PASHA ACM Transactions on Embedded Computing Systems . 2020,第2期

机译：FFCONV：卷积神经网络中的快速卷积层的基于FPGA的加速器
2. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [J] . Shimoda Masayuki, Sada Youki, Nakahara Hiroki Journal of signal processing systems for signal, image, and video technology . 2021,第5期

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的百帘
3. A survey of FPGA-based accelerators for convolutional neural networks [J] . Neural computing & applications . 2020,第4期

机译：基于FPGA的卷积神经网络的加速器调查
4. Improving Performance Estimation for FPGA-Based Accelerators for Convolutional Neural Networks [C] . Martin Ferianc, Hongxiang Fan, Ringo S. W. Chu, International Symposium on Applied Reconfigurable Computing . 2020

机译：提高卷积神经网络的FPGA加速器性能估计
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. 3D Convolutional Neural Networks Initialized from Pretrained 2D Convolutional Neural Networks for Classification of Industrial Parts [O] . Ibon Merino, Jon Azpiazu, Anthony Remazeilles, 2021

机译：3D卷积神经网络从佩带的2D卷积神经网络初始化用于工业部件的分类
7. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [O] . Masayuki Shimoda, Youki Sada, Hiroki Nakahara 2021

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的平铺

Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅