Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators

Inpyo Bae; Barend Harris; Hyemi Min; Bernhard Egger

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators

【24h】

Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators

机译：粗粒度可重新配置的基于阵列的加速器的自动调整CNN

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

As more and more deep learning tasks are pushed to mobile devices, accelerators for running these networks efficiently gain in importance. We show a that an existing class of general purpose accelerators, modulo-scheduled coarse-grained reconfigurable array (CGRA) processors typically used to accelerate multimedia workloads, can be a viable alternative to dedicated deep neural network processing hardware. To this end, an auto-tuning compiler is presented that maps convolutional neural networks (CNNs) efficiently on such architectures. The auto-tuner analyzes the structure of the CNN and the features of the CGRA, then explores the large optimization space to generate code that allows for an efficient mapping of the network. Evaluated with various CNNs, the auto-tuned code achieves an 11-fold speedup over the initial mapping. Comparing the energy per interference, the CGRA outperforms other general-purpose accelerators and an ARMv8 processor by a significant margin.

机译：随着越来越多的深度学习任务被推到移动设备上，用于有效运行这些网络的加速器变得越来越重要。我们证明，现有一类通用加速器（通常用于加速多媒体工作负载的模调度粗粒度可重配置阵列（CGRA）处理器）可以替代专用深度神经网络处理硬件。为此，提出了一种自动调整编译器，该编译器可在这种架构上有效地映射卷积神经网络（CNN）。自动调谐器会分析CNN的结构和CGRA的功能，然后探索大型优化空间以生成代码，从而实现网络的有效映射。经过各种CNN评估，自动调整的代码比初始映射的速度提高了11倍。在比较每次干扰的能量时，CGRA的性能大大优于其他通用加速器和ARMv8处理器。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2018年第11期|2301-2310|共10页
作者
Inpyo Bae; Barend Harris; Hyemi Min; Bernhard Egger;
展开▼
作者单位

Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea;

Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea;

Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea;

Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Optimization; Program processors; Computer architecture; Convolution; Kernel; Hardware; Convolutional codes;

机译：优化;程序处理器;计算机体系结构;卷积;内核;硬件;卷积代码;

相似文献

外文文献
中文文献
专利

1. Dataflow-Functional High-Level Synthesis for Coarse-Grained Reconfigurable Accelerators [J] . Rubattu Claudio, Palumbo Francesca, Sau Carlo, Embedded Systems Letters, IEEE . 2019,第3期

机译：粗粒度可重构加速器的数据流功能高级综合
2. A hybrid design space exploration approach for a coarse-grained reconfigurable accelerator [J] . Mehdipour Farhad, Noori Hamid, Honda Hiroaki, 電子情報通信学会技術研究報告. VLSI設計技術. VLSI Design Technologies . 2008,第415期

机译：粗粒度可重构加速器的混合设计空间探索方法
3. A hybrid design space exploration approach for a coarse-grained reconfigurable accelerator [J] . Mehdipour Farhad, Noori Hamid, Honda Hiroaki, 電子情報通信学会技術研究報告. コンピュ-タシステム. Computer Systems . 2008,第417期

机译：粗粒度可重构加速器的混合设计空间探索方法
4. AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators [C] . Cheng Tan, Chenhao Xie, Ang Li, Design, Automation and Test in Europe Conference and Exhibition . 2021

机译：极光：粗粒型可重构加速器的自动改进
5. Design of Hardware CNN Accelerators for Audio and Image Classification [D] . Gillela, Rohini Jayachandre. 2020

机译：音频和图像分类硬件CNN加速器的设计
6. A Field Programmable Gate Array-Based Reconfigurable Smart-Sensor Network for Wireless Monitoring of New Generation Computer Numerically Controlled Machines [O] . Sandra Veronica Moreno-Tapia, Luis Alberto Vera-Salas, Roque Alfredo Osornio-Rios, 2010

机译：基于现场可编程门阵列的可重构智能传感器网络用于无线监控新一代计算机数控机床
7. Auto-Tuning Dedispersion for Many-Core Accelerators [O] . Sclocco, Alessio, Bal, Henri E., Hessels, Jason, 2016

机译：多核加速器的自动调整精细分散

Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅