Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

SAYED OMID AYAT; MOHAMED KHALIL-HANI; AB AL-HADI AB RAHMAN

首页> 外文期刊>Turkish Journal of Electrical Engineering and Computer Sciences >Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

【24h】

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

机译：使用扩展的Roofline模型优化基于FPGA的CNN加速器以提高能效

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model Authors: SAYED OMID AYAT, MOHAMED KHALIL-HANI, AB AL-HADI AB RAHMAN Abstract: In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its flexibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractive solution to exploit the inherent parallelism in the feedforward process of the CNN. However, to meet the demands for high accuracy of today's practical recognition applications that typically have massive datasets, the sizes of CNNs have to be larger and deeper. Enlargement of the CNN aggravates the problem of off-chip memory bottleneck in the FPGA platform since there is not enough space to save large datasets on-chip. In this work, we propose a memory system architecture that best matches the off-chip memory traffic with the optimum throughput of the computation engine, while it operates at the maximum allowable frequency. With the help of an extended version of the Roofline model proposed in this work, we can estimate memory bandwidth utilization of the system at different operating frequencies since the proposed model considers operating frequency in addition to bandwidth utilization and throughput. In order to find the optimal solution that has the best energy efficiency, we make a trade-off between energy efficiency and computational throughput. This solution saves 18% of energy utilization with the trade-off having less than 2% reduction in throughput performance. We also propose to use a race-to-halt strategy to further improve the energy efficiency of the designed CNN accelerator. Experimental results show that our CNN accelerator can achieve a peak performance of 52.11 GFLOPS and energy efficiency of 10.02 GFLOPS/W on a ZYNQ ZC706 FPGA board running at 250 MHz, which outperforms most previous approaches. Keywords: Convolutional neural network, field-programmable gate array, energy efficiency, Roofline model, race-to-halt strategy. Full Text: PDF.

机译：使用扩展的Roofline模型优化基于FPGA的CNN加速器以提高能效作者：SAYED OMID AYAT，MOHAMED KHALIL-HANI，AB AL-HADI AB RAHMAN摘要：近年来，卷积神经网络（CNN）在求解中得到了广泛认可实际的计算机视觉和图像识别问题。同样在最近，由于其灵活性，更快的开发时间和能源效率，现场可编程门阵列（FPGA）已成为在CNN前馈过程中利用固有并行性的有吸引力的解决方案。但是，为了满足通常具有大量数据集的当今实际识别应用程序对高精度的要求，CNN的大小必须更大，更深。 CNN的扩大加剧了FPGA平台中片外存储器瓶颈的问题，因为没有足够的空间来将大型数据集保存在片上。在这项工作中，我们提出了一种内存系统架构，该架构可以最佳地匹配片外内存流量和计算引擎的最佳吞吐量，同时以最大允许频率运行。借助本文中提出的Roofline模型的扩展版本，我们可以估计系统在不同工作频率下的内存带宽利用率，因为该模型除了考虑带宽利用率和吞吐量之外还考虑了工作频率。为了找到具有最佳能源效率的最优解决方案，我们在能源效率和计算吞吐量之间进行了权衡。该解决方案可节省18％的能源利用率，但需要权衡取舍，将吞吐量性能降低不到2％。我们还建议使用从竞争停止的策略来进一步提高设计的CNN加速器的能效。实验结果表明，在以250 MHz运行的ZYNQ ZC706 FPGA板上，我们的CNN加速器可以达到52.11 GFLOPS的峰值性能和10.02 GFLOPS / W的能效，其性能优于大多数以前的方法。关键字：卷积神经网络，现场可编程门阵列，能效，Roofline模型，逐项停止策略。全文：PDF。

著录项

来源
《Turkish Journal of Electrical Engineering and Computer Sciences》 |2018年第2期|共17页
作者
SAYED OMID AYAT; MOHAMED KHALIL-HANI; AB AL-HADI AB RAHMAN;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工业经济;
关键词

相似文献

外文文献
中文文献
专利

1. DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators [J] . Xing Yu, Liang Shuang, Sui Lingzhi, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：DNNVM：端到端编译器利用基于FPGA的CNN加速器上的异构优化
2. Optimizing the design of a neutron dosemeter with an extended energy range for high energy accelerators [J] . Peleshko V. N., Savitskaya E. N., Sannikov A. V. Instruments and Experimental Techniques . 2015,第4期

机译：针对高能加速器优化具有扩展能量范围的中子剂量计的设计
3. Optimizing energy efficiency of CNN-based object detection with dynamic voltage and frequency scaling [J] . Journal of Semiconductors . 2020,第2期

机译：用动态电压和频率缩放优化基于CNN的物体检测的能效
4. Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory [C] . Chao Jiang, David Ojika, Bhavesh Patel, IEEE Annual International Symposium on Field-Programmable Custom Computing Machines . 2021

机译：基于FPGA的深度学习加速器，用于使用高带宽存储器的稀疏CNN
5. Design, modeling and simulations of a Cabinet Safe System for a linear particle accelerator of intermediate-low energy by optimization of the beam optics. [D] . Maidana, Carlos Omar. 2007

机译：通过优化光束光学系统，对中低能量线性粒子加速器的内阁安全系统进行设计，建模和仿真。
6. Network Modeling and Energy-Efficiency Optimization for Advanced Machine-to-Machine Sensor Networks [O] . Sungmo Jung, Jong Hyun Kim, Seoksoo Kim 2012

机译：先进的机器对机器传感器网络的网络建模和能效优化
7. Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model [O] . SAYED OMID AYAT, MOHAMED KHALIL-HANI, AB AL-HADI AB RAHMAN 2018

机译：优化基于FPGA的CNN加速器，以扩展屋顶线模型的能效

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

摘要

著录项

相似文献

相关主题

期刊订阅