首页> 外文会议>IEEE International Symposium on Industrial Electronics >A Memory-Optimized and Energy-Efficient CNN Acceleration Architecture Based on FPGA
【24h】

A Memory-Optimized and Energy-Efficient CNN Acceleration Architecture Based on FPGA

机译:基于FPGA的内存优化节能型CNN加速架构

获取原文

摘要

The development of Convolutional Neural Network (CNN) contributes to breakthroughs made in the field of artificial intelligence. Compared with traditional algorithms, CNN has merits in speed and accuracy concerning detection, identification and classification. GPU is of great popularity for implementing CNN on account of its computational capacity. However, its high power consumption limits the application in the embedded field. Recently, researchers accelerate CNN utilizing Field Programmable Gate Arrays (FPGA) which is demonstrated more energy-efficient than GPU, and is suitable for the applications of embedded systems. Although FPGA has the superiority in low power consumption, powerful parallel computing and high flexibility, bandwidth and memory accessing become the bottleneck of CNN accelerator design. In this paper, a novel memory-optimized and energy-efficient CNN accelerating architecture is proposed. The paper analyzes the on-chip memory and off-chip memory resources of FPGA, and proposes a memory optimization solution using specially mixed operation of FIFO and ping-pong. To ensure accuracy, a folat-16 CNN model is used to test the framework, and evaluated on Xilinx ZCU102 platform which has both Arm-Core and FPGA on one chip. After testing the VGG-16 Net and a FCN Net with 500MB weights, the architecture is 10 times faster than CPU, and has better energy-efficiency than GPU does.
机译:卷积神经网络(CNN)的发展为人工智能领域的突破做出了贡献。与传统算法相比,CNN在检测,识别和分类方面具有速度和准确性方面的优势。 GPU因其计算能力而在实施CNN方面非常受欢迎。但是,其高功耗限制了其在嵌入式领域的应用。最近,研究人员利用现场可编程门阵列(FPGA)加速了CNN,事实证明,现场可编程门阵列比GPU更节能,并且适用于嵌入式系统的应用。尽管FPGA在低功耗方面具有优势,但强大的并行计算和高灵活性,带宽和内存访问已成为CNN加速器设计的瓶颈。本文提出了一种新颖的内存优化和节能的CNN加速架构。本文分析了FPGA的片上存储器和片外存储器资源,并提出了一种使用FIFO和乒乓的特殊混合操作的存储器优化解决方案。为了确保准确性,使用了folat-16 CNN模型来测试该框架,并在Xilinx ZCU102平台上对其进行了评估,该平台在一个芯片上同时具有Arm-Core和FPGA。在测试了重量为500MB的VGG-16网络和FCN网络之后,该架构比CPU快10倍,并且具有比GPU更好的能源效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号