...
首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip
【24h】

MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip

机译:MALOC:用于卷积神经网络的全流水线FPGA加速器,所有层都映射在芯片上

获取原文
获取原文并翻译 | 示例
           

摘要

Recently, field-programmable gate arrays (FPGAs) have been widely used in the implementations of hardware accelerator for convolutional neural networks (CNNs). However, most of these existing accelerators are designed in the same idea as their ASIC counterparts, in which all operations from different layers are mapped to the same hardware units and working in a multiplexed way. This manner does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation. In this paper, we propose a new architecture for FPGA-based CNN accelerator that maps all the layers to their own on-chip units and working concurrently as a pipeline. A comprehensive mapping and optimizing methodology based on establishing roofline model oriented optimization model is proposed, which can achieve maximum resource utilization as well as optimal computational efficiency. Besides, to ease the programming burden, we propose a design framework which can provide a one-stop function for developers to generate the accelerator with our optimizing methodology. We evaluate our proposal by implementing different modern CNN models on Xilinx Zynq-7020 and Virtex-7 690t FPGA platforms. Experimental results show that our implementations can achieve a peak performance of 910.2 GOPS on Virtex-7 690t, and 36.36 GOP/s/W energy efficiency on Zynq-7020, which are superior to the previous approaches.
机译:最近,现场可编程门阵列(FPGA)已被广泛用于卷积神经网络(CNN)的硬件加速器的实现中。但是,大多数这些现有加速器的设计原理与ASIC加速器相同,其中来自不同层的所有操作都映射到相同的硬件单元并以多路复用的方式工作。这种方式没有充分利用FPGA的可重新配置性和可定制性,从而导致一定程度的计算效率下降。在本文中,我们为基于FPGA的CNN加速器提出了一种新架构,该架构将所有层映射到它们自己的片上单元并作为管道并行工作。提出了一种基于建立屋顶模型的优化模型的综合映射和优化方法,该方法可以实现最大的资源利用率和最优的计算效率。此外,为减轻编程负担,我们提出了一个设计框架,该框架可为开发人员提供一站式功能,以我们的优化方法生成加速器。我们通过在Xilinx Zynq-7020和Virtex-7 690t FPGA平台上实施不同的现代CNN模型来评估我们的建议。实验结果表明,我们的实现在Virtex-7 690t上可以达到910.2 GOPS的峰值性能,在Zynq-7020上可以达到36.36 GOP / s / W的能量效率,这优于以前的方法。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号