首页> 外文期刊>Concurrency and Computation >FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency
【24h】

FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency

机译:FPGA加速的深度卷积神经网络,可实现高吞吐量和能效

获取原文
获取原文并翻译 | 示例

摘要

Recent breakthroughs in the deep convolutional neural networks (CNNs) have led to great improvements in the accuracy of both vision and auditory systems. Characterized by their deep structures and large numbers of parameters, deep CNNs challenge the computational performance of today. Hardware specialization in the form of field-programmable gate array offers a promising path towards major leaps in computational performance while achieving high-energy efficiency. In this paper, we focus on accelerating deep CNNs using the Xilinx Zynq-zq7045 FPGA SoC. As most of the computational workload can be converted to matrix multiplications, we adopt a matrix multiplier-based accelerator architecture. Dedicated units are designed to eliminate the conversion overhead. We also design a customized memory system according to the memory access pattern of CNNs. To make the accelerator easily usable by application developers, our accelerator supports Caffe, which is a widely used software framework of deep CNN. Different CNN models can be adopted by our accelerator, with good performance portability. The experimental results show that for a typical application of CNN, image classification, an average throughout of 77.8 GFLOPS is achieved, while the energy efficiency is 4.7u0002 better than an Nvidia K20 GPGPU.
机译:深度卷积神经网络(CNN)的最新突破导致视觉和听觉系统的准确性得到了极大的提高。具有深层结构和大量参数的特征使深层CNN挑战了当今的计算性能。现场可编程门阵列形式的硬件专业化为实现高能效的同时实现计算性能的重大飞跃提供了一条有希望的途径。在本文中,我们专注于使用Xilinx Zynq-zq7045 FPGA SoC加速深层CNN。由于大多数计算工作量都可以转换为矩阵乘法,因此我们采用基于矩阵乘法器的加速器体系结构。专用单元旨在消除转换开销。我们还根据CNN的内存访问模式设计了定制的内存系统。为了使该加速器易于被应用程序开发人员使用,我们的加速器支持Caffe,Caffe是深度CNN的一种广泛使用的软件框架。我们的加速器可以采用不同的CNN模型,并具有良好的性能可移植性。实验结果表明,对于CNN的典型应用,图像分类可实现平均77.8 GFLOPS,而能源效率比Nvidia K20 GPGPU高4.7u0002。

著录项

  • 来源
    《Concurrency and Computation》 |2017年第20期|e3850.1-e3850.20|共20页
  • 作者单位

    Department of Computer, State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, Hunan, China;

    Department of Computer, State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, Hunan, China;

    Department of Computer, State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, Hunan, China;

    Department of Computer, State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, Hunan, China;

    Department of Computer, State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, Hunan, China;

    Department of Computer, State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, Hunan, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    CNN; Accelerator; FPGA; Matrix Multiplier; Caffe;

    机译:CNN;加速器;FPGA;矩阵乘法器咖啡;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号