首页> 外文期刊>International journal of reconfigurable computing >Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL
【24h】

Design of FPGA-Based Accelerator for Convolutional Neural Network under Heterogeneous Computing Framework with OpenCL

机译:基于OpenCL的异构计算框架下基于FPGA的卷积神经网络加速器设计

获取原文
获取原文并翻译 | 示例
       

摘要

CPU has insufficient resources to satisfy the efficient computation of the convolution neural network (CNN), especially for embedded applications. Therefore, heterogeneous computing platforms are widely used to accelerate CNN tasks, such as GPU, FPGA, and ASIC. Among these, FPGA can accelerate the computation by mapping the algorithm to the parallel hardware instead of CPU, which cannot fully exploit the parallelism. By fully using the parallelism of the neural networks structure, FPGA can reduce the computing costs and increase the computing speed. However, the development of FPGA requires great design skills. As a heterogeneous development platform, OpenCL has some advantages such as high abstraction level, short development cycle, and strong portability, which can make up for the lack of skilled designers. This paper uses Xilinx SDAccel to realize the parallel acceleration of CNN task, and it also proposes an optimizing strategy of single convolutional layer to accelerate CNN. Simulation results show that the calculation speed could be improved by adopting the proposed optimizing strategy. Compared with the baseline design, the strategy of single convolutional layer could increase the computing speed 14 times. Performance of the whole CNN task could be improved 2 times more than before, and the speed of image classification could attain more than 48 fps.
机译:CPU没有足够的资源来满足卷积神经网络(CNN)的高效计算,特别是对于嵌入式应用程序。因此,异构计算平台被广泛用于加速CNN任务,例如GPU,FPGA和ASIC。其中,FPGA可以通过将算法映射到并行硬件而不是CPU来加速计算,而并行硬件不能充分利用并行性。通过充分利用神经网络结构的并行性,FPGA可以降低计算成本并提高计算速度。但是,FPGA的开发需要出色的设计技能。作为一个异构开发平台,OpenCL具有较高的抽象级别,较短的开发周期和强大的可移植性等优点,可以弥补缺乏熟练设计人员的不足。本文利用赛灵思SDAccel实现CNN任务的并行加速,并提出了单卷积层加速CNN的优化策略。仿真结果表明,采用该优化策略可以提高计算速度。与基线设计相比,单卷积层策略可以将计算速度提高14倍。整个CNN任务的性能可以比以前提高2倍,并且图像分类的速度可以达到48 fps以上。

著录项

  • 来源
    《International journal of reconfigurable computing》 |2018年第2018期|1785892.1-1785892.10|共10页
  • 作者单位

    Department of Electronic Science and Technology, Beijing Jiaotong University, Beijing China;

    Department of Electronic Science and Technology, Beijing Jiaotong University, Beijing China;

    Department of Electronic Engineering Tsinghua University, Beijing, China;

    Department of Electronic Engineering Tsinghua University, Beijing, China;

    Department of Electronic Engineering Tsinghua University, Beijing, China;

    Department of Electronic Science and Technology, Beijing Jiaotong University, Beijing China;

    China University of Petroleum, Beijing China;

    Department of Electronic Engineering Tsinghua University, Beijing, China;

    Department of Mechanical Engineering Tsinghua University, Beijing, China;

    Department of Electronic Engineering Tsinghua University, Beijing, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-18 03:55:25

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号