首页> 外文期刊>Turkish Journal of Electrical Engineering and Computer Sciences >Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
【24h】

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

机译:使用扩展的Roofline模型优化基于FPGA的CNN加速器以提高能效

获取原文
           

摘要

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model Authors: SAYED OMID AYAT, MOHAMED KHALIL-HANI, AB AL-HADI AB RAHMAN Abstract: In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its flexibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractive solution to exploit the inherent parallelism in the feedforward process of the CNN. However, to meet the demands for high accuracy of today's practical recognition applications that typically have massive datasets, the sizes of CNNs have to be larger and deeper. Enlargement of the CNN aggravates the problem of off-chip memory bottleneck in the FPGA platform since there is not enough space to save large datasets on-chip. In this work, we propose a memory system architecture that best matches the off-chip memory traffic with the optimum throughput of the computation engine, while it operates at the maximum allowable frequency. With the help of an extended version of the Roofline model proposed in this work, we can estimate memory bandwidth utilization of the system at different operating frequencies since the proposed model considers operating frequency in addition to bandwidth utilization and throughput. In order to find the optimal solution that has the best energy efficiency, we make a trade-off between energy efficiency and computational throughput. This solution saves 18% of energy utilization with the trade-off having less than 2% reduction in throughput performance. We also propose to use a race-to-halt strategy to further improve the energy efficiency of the designed CNN accelerator. Experimental results show that our CNN accelerator can achieve a peak performance of 52.11 GFLOPS and energy efficiency of 10.02 GFLOPS/W on a ZYNQ ZC706 FPGA board running at 250 MHz, which outperforms most previous approaches. Keywords: Convolutional neural network, field-programmable gate array, energy efficiency, Roofline model, race-to-halt strategy. Full Text: PDF.
机译:使用扩展的Roofline模型优化基于FPGA的CNN加速器以提高能效作者:SAYED OMID AYAT,MOHAMED KHALIL-HANI,AB AL-HADI AB RAHMAN摘要:近年来,卷积神经网络(CNN)在求解中得到了广泛认可实际的计算机视觉和图像识别问题。同样在最近,由于其灵活性,更快的开发时间和能源效率,现场可编程门阵列(FPGA)已成为在CNN前馈过程中利用固有并行性的有吸引力的解决方案。但是,为了满足通常具有大量数据集的当今实际识别应用程序对高精度的要求,CNN的大小必须更大,更深。 CNN的扩大加剧了FPGA平台中片外存储器瓶颈的问题,因为没有足够的空间来将大型数据集保存在片上。在这项工作中,我们提出了一种内存系统架构,该架构可以最佳地匹配片外内存流量和计算引擎的最佳吞吐量,同时以最大允许频率运行。借助本文中提出的Roofline模型的扩展版本,我们可以估计系统在不同工作频率下的内存带宽利用率,因为该模型除了考虑带宽利用率和吞吐量之外还考虑了工作频率。为了找到具有最佳能源效率的最优解决方案,我们在能源效率和计算吞吐量之间进行了权衡。该解决方案可节省18%的能源利用率,但需要权衡取舍,将吞吐量性能降低不到2%。我们还建议使用从竞争停止的策略来进一步提高设计的CNN加速器的能效。实验结果表明,在以250 MHz运行的ZYNQ ZC706 FPGA板上,我们的CNN加速器可以达到52.11 GFLOPS的峰值性能和10.02 GFLOPS / W的能效,其性能优于大多数以前的方法。关键字:卷积神经网络,现场可编程门阵列,能效,Roofline模型,逐项停止策略。全文:PDF。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号