首页> 外文会议>IEEE International Solid- State Circuits Conference >14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks
【24h】

14.2 DNPU: An 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks

机译:14.2 DNPU:用于通用深度神经网络的8.1TOPS / W可重配置CNN-RNN处理器

获取原文

摘要

Recently, deep learning with convolutional neural networks (CNNs) and recurrent neural networks (RNNs) has become universal in all-around applications. CNNs are used to support vision recognition and processing, and RNNs are able to recognize time varying entities and to support generative models. Also, combining both CNNs and RNNs can recognize time varying visual entities, such as action and gesture, and to support image captioning [1]. However, the computational requirements in CNNs are quite different from those of RNNs. Fig. 14.2.1 shows a computation and weight-size analysis of convolution layers (CLs), fully-connected layers (FCLs) and RNN-LSTM layers (RLs). While CLs require a massive amount of computation with a relatively small number of filter weights, FCLs and RLs require a relatively small amount of computation with a huge number of filter weights. Therefore, when FCLs and RLs are accelerated with SoCs specialized for CLs, they suffer from high memory transaction costs, low PE utilization, and a mismatch of the computational patterns. Conversely, when CLs are accelerated with FCL- and RL-dedicated SoCs, they cannot exploit reusability and achieve required throughput. So far, works have considered acceleration of CLs, such as [2–4], or FCLs and RLs like [5]. However, there has been no work on a combined CNN-RNN processor. In addition, a highly reconfigurable CNN-RNN processor with high energy-efficiency is desirable to support general-purpose deep neural networks (DNNs).
机译:最近,使用卷积神经网络(CNN)和递归神经网络(RNN)进行深度学习已在所有应用程序中普及。 CNN用于支持视觉识别和处理,RNN能够识别时变实体并支持生成模型。同样,结合CNN和RNN可以识别时变视觉实体,例如动作和手势,并支持图像字幕[1]。但是,CNN的计算要求与RNN的计算要求完全不同。图14.2.1显示了卷积层(CL),全连接层(FCL)和RNN-LSTM层(RL)的计算和权重大小分析。尽管CL需要使用相对较少的滤波器权重进行大量计算,但FCL和RL需要使用相对大量的滤波器权重进行相对少量的计算。因此,当使用专用于CL的SoC加速FCL和RL时,它们将遭受高昂的内存交易成本,低PE使用率以及计算模式不匹配的困扰。相反,当使用FCL和RL专用SoC加速CL时,它们将无法利用可重用性并无法达到所需的吞吐量。到目前为止,工作已经考虑了CL的加速,例如[2-4],或者FCL和RL的加速,例如[5]。但是,尚未对组合的CNN-RNN处理器进行任何工作。另外,需要具有高能效的高度可重构的CNN-RNN处理器来支持通用深度神经网络(DNN)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号