首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
【24h】

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

机译:NullHop:基于特征图的稀疏表示的柔性卷积神经网络加速器

获取原文
获取原文并翻译 | 示例

摘要

Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though graphical processing units are most often used in training and deploying CNNs, their power efficiency is less than 10 GOp/s/W for single-frame runtime inference. We propose a flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios. NullHop exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can process up to 128 input and 128 output feature maps per layer in a single pass. We implemented the proposed architecture on a Xilinx Zynq field-programmable gate array (FPGA) platform and presented the results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. Postsynthesis simulations using Mentor Modelsim in a 28-nm process with a clock frequency of 500 MHz show that the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop achieves an efficiency of 368%, maintains over 98% utilization of the multiply-accumulate units, and achieves a power efficiency of over 3 TOp/s/W in a core area of 6.3 mm(2). As further proof of NullHop's usability, we interfaced its FPGA implementation with a neuromorphic event camera for real-time interactive demonstrations.
机译:卷积神经网络(CNN)已成为解决许多最新(SOA)视觉处理任务的主要神经网络体系结构。尽管图形处理单元最常用于训练和部署CNN,但对于单帧运行时推断,其功率效率仍低于10 GOp / s / W。我们提出了一种灵活,高效的CNN加速器体系结构,称为NullHop,该体系结构实现了对低功耗和低延迟应用场景有用的SOA CNN。 NullHop利用CNN中神经元激活的稀疏性来加速计算并减少内存需求。灵活的体系结构允许在1x1到7x7的整个内核大小之间高效利用可用的计算资源。 NullHop一次可以处理每层多达128个输入和128个输出要素图。我们在Xilinx Zynq现场可编程门阵列(FPGA)平台上实现了建议的体系结构,并给出了结果,展示了我们的实现如何减少五个不同CNN中的外部存储器传输和计算时间,从小型CNN到众所周知的大型VGG16和VGG19 CNN。使用Mentor Modelsim在28-nm工艺中以500 MHz的时钟频率进行后合成仿真,结果表明VGG19网络可达到450 GOp / s以上。通过利用稀疏性,NullHop可以达到368%的效率,将乘法累加单元的利用率保持在98%以上,并且在6.3 mm(2)的核心区域中实现的功率效率超过3 TOp / s / W。为了进一步证明NullHop的可用性,我们将其FPGA实现与神经形态事件相机进行接口,以进行实时交互式演示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号