FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors

机译：FCN引擎：加速经典CNN处理器中的反卷积层

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unlike standard Convolutional Neural Networks (CNNs) with fully-connected layers, Fully Convolutional Neural Networks (FCN) are prevalent in computer vision applications such as object detection, semantic/image segmentation, and the most popular generative tasks based on Generative Adversarial Networks (GAN). In an FCN, traditional convolutional layers and deconvolutional layers contribute to the majority of the computation complexity. However, prior deep learning accelerator designs mostly focus on CNN optimization. They either use independent compute-resources to handle deconvolution or convert deconvolutional layers (Deconv) into general convolution operations, which arouses considerable overhead. To address this problem, we propose a unified fully convolutional accelerator aiming to handle both the deconvolutional and convolutional layers with a single processing element (PE) array. We re-optimize the conventional CNN accelerator architecture of regular 2D processing elements array, to enable it more efficiently support the data flow of deconvolutional layer inference. By exploiting the locality in deconvolutional filters, this architecture reduces the consumption of on-chip memory communication from 24.79 GB to 6.56 GB and improves the power efficiency significantly. Compared to prior baseline deconvolution acceleration scheme, the proposed accelerator achieves 1.3×-44.9× speedup and reduces the energy consumption by 14.60/0-97.6% on a set of representative benchmark applications. Meanwhile, it keeps similar CNN inference performance to that of an optimized CNN-only accelerator with negligible power consumption and chip area overhead.

机译：与具有完全连接层的标准卷积神经网络（CNN）不同，全卷积神经网络（FCN）在计算机视觉应用程序中很普遍，例如对象检测，语义/图像分割以及基于生成对抗网络（GAN）的最流行的生成任务）。在FCN中，传统的卷积层和反卷积层构成了大多数计算复杂性。但是，先前的深度学习加速器设计主要集中在CNN优化上。他们要么使用独立的计算资源来处理反卷积，要么将反卷积层（Deconv）转换为一般的卷积运算，这会引起可观的开销。为了解决这个问题，我们提出了一个统一的全卷积加速器，旨在通过单个处理元素（PE）数组处理反卷积和卷积层。我们对常规2D处理元素阵列的常规CNN加速器体系结构进行了重新优化，以使其更有效地支持反卷积层推理的数据流。通过利用反卷积滤波器的局部性，该架构将片上存储器通信的消耗从24.79 GB降低到6.56 GB，并显着提高了电源效率。与先前的基线反卷积加速方案相比，在一组代表性的基准应用程序上，拟议的加速器实现了1.3×-44.9×的加速，并减少了14.60 / 0-97.6％的能耗。同时，它保持了与优化的仅CNN加速器相似的CNN推理性能，而功耗和芯片面积开销却可以忽略不计。

著录项

来源
《IEEE/ACM International Conference on Computer-Aided Design》|2018年|1-6|共6页
会议地点
作者
Dawen Xu; Kaijie Tu; Ying Wang; Cheng Liu; Bingsheng He; Huawei Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deconvolution; Convolution; Acceleration; Arrays; Hardware; Two dimensional displays;

机译：反卷积;卷积;加速度;数组;硬件;二维显示;

相似文献

外文文献
中文文献
专利

1. Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs [J] . Xie Xinfeng, Du Dayou, Li Qian, ACM Transactions on Embedded Computing Systems . 2018,第2期

机译：利用稀疏性加速在移动SOC上的基于CNN的基于CNN的应用层的完全连接层
2. Image Processing of Two-Layer CNNs―Applications and Their Stability― [J] . Zonghuang YANG, Yoshifumi NISHIO, Akio USHIDA IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences . 2002,第9期

机译：两层CNN的图像处理及其应用和稳定性
3. Image processing using two layer CNN [J] . Yuuki Funakoshi, Zonghuang Yang, Yoshifumi Nishio, 電子情報通信学会技術研究報告. 非線形問題. Nonlinear Problems . 2000,第469期

机译：使用两层CNN的图像处理
4. FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors [C] . Dawen Xu, Kaijie Tu, Ying Wang, IEEE/ACM International Conference on Computer-Aided Design . 2018

机译：FCN-Engine：在经典CNN处理器中加速碎屑层
5. Exploiting Common Layers among Heterogeneous CNNs in Automotive Vision Systems [D] . Mansour, Iyad Faisal Ghazi. 2019

机译：在汽车视觉系统中利用异构CNN中的普通层
6. Structural Dynamic and Electrostatic Properties of Fully Hydrated DMPC Bilayers From Molecular Dynamics Simulations Accelerated with Graphical Processing Units (GPUs) [O] . Narayan Ganesan, Brad A. Bauer, Timothy R. Lucas, -1

机译：与图形处理单元加速完全水化的DmpC双层膜的分子动力学模拟的结构动态和带电性能（图形处理器）
7. Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory [O] . Donghee Ha, Mooseop Kim, KyeongDeok Moon, 2021

机译：在统一内存中加速与层面处理器选择方法的设备学习

FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors

摘要

著录项

相似文献

相关主题

期刊订阅