首页> 外文会议>IEEE/ACM International Conference on Computer-Aided Design >FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors
【24h】

FCN-Engine: Accelerating Deconvolutional Layers in Classic CNN Processors

机译:FCN引擎:加速经典CNN处理器中的反卷积层

获取原文

摘要

Unlike standard Convolutional Neural Networks (CNNs) with fully-connected layers, Fully Convolutional Neural Networks (FCN) are prevalent in computer vision applications such as object detection, semantic/image segmentation, and the most popular generative tasks based on Generative Adversarial Networks (GAN). In an FCN, traditional convolutional layers and deconvolutional layers contribute to the majority of the computation complexity. However, prior deep learning accelerator designs mostly focus on CNN optimization. They either use independent compute-resources to handle deconvolution or convert deconvolutional layers (Deconv) into general convolution operations, which arouses considerable overhead. To address this problem, we propose a unified fully convolutional accelerator aiming to handle both the deconvolutional and convolutional layers with a single processing element (PE) array. We re-optimize the conventional CNN accelerator architecture of regular 2D processing elements array, to enable it more efficiently support the data flow of deconvolutional layer inference. By exploiting the locality in deconvolutional filters, this architecture reduces the consumption of on-chip memory communication from 24.79 GB to 6.56 GB and improves the power efficiency significantly. Compared to prior baseline deconvolution acceleration scheme, the proposed accelerator achieves 1.3×-44.9× speedup and reduces the energy consumption by 14.60/0-97.6% on a set of representative benchmark applications. Meanwhile, it keeps similar CNN inference performance to that of an optimized CNN-only accelerator with negligible power consumption and chip area overhead.
机译:与具有完全连接层的标准卷积神经网络(CNN)不同,全卷积神经网络(FCN)在计算机视觉应用程序中很普遍,例如对象检测,语义/图像分割以及基于生成对抗网络(GAN)的最流行的生成任务)。在FCN中,传统的卷积层和反卷积层构成了大多数计算复杂性。但是,先前的深度学习加速器设计主要集中在CNN优化上。他们要么使用独立的计算资源来处理反卷积,要么将反卷积层(Deconv)转换为一般的卷积运算,这会引起可观的开销。为了解决这个问题,我们提出了一个统一的全卷积加速器,旨在通过单个处理元素(PE)数组处理反卷积和卷积层。我们对常规2D处理元素阵列的常规CNN加速器体系结构进行了重新优化,以使其更有效地支持反卷积层推理的数据流。通过利用反卷积滤波器的局部性,该架构将片上存储器通信的消耗从24.79 GB降低到6.56 GB,并显着提高了电源效率。与先前的基线反卷积加速方案相比,在一组代表性的基准应用程序上,拟议的加速器实现了1.3×-44.9×的加速,并减少了14.60 / 0-97.6%的能耗。同时,它保持了与优化的仅CNN加速器相似的CNN推理性能,而功耗和芯片面积开销却可以忽略不计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号