首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators
【24h】

Auto-Tuning CNNs for Coarse-Grained Reconfigurable Array-Based Accelerators

机译:粗粒度可重新配置的基于阵列的加速器的自动调整CNN

获取原文
获取原文并翻译 | 示例

摘要

As more and more deep learning tasks are pushed to mobile devices, accelerators for running these networks efficiently gain in importance. We show a that an existing class of general purpose accelerators, modulo-scheduled coarse-grained reconfigurable array (CGRA) processors typically used to accelerate multimedia workloads, can be a viable alternative to dedicated deep neural network processing hardware. To this end, an auto-tuning compiler is presented that maps convolutional neural networks (CNNs) efficiently on such architectures. The auto-tuner analyzes the structure of the CNN and the features of the CGRA, then explores the large optimization space to generate code that allows for an efficient mapping of the network. Evaluated with various CNNs, the auto-tuned code achieves an 11-fold speedup over the initial mapping. Comparing the energy per interference, the CGRA outperforms other general-purpose accelerators and an ARMv8 processor by a significant margin.
机译:随着越来越多的深度学习任务被推到移动设备上,用于有效运行这些网络的加速器变得越来越重要。我们证明,现有一类通用加速器(通常用于加速多媒体工作负载的模调度粗粒度可重配置阵列(CGRA)处理器)可以替代专用深度神经网络处理硬件。为此,提出了一种自动调整编译器,该编译器可在这种架构上有效地映射卷积神经网络(CNN)。自动调谐器会分析CNN的结构和CGRA的功能,然后探索大型优化空间以生成代码,从而实现网络的有效映射。经过各种CNN评估,自动调整的代码比初始映射的速度提高了11倍。在比较每次干扰的能量时,CGRA的性能大大优于其他通用加速器和ARMv8处理器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号