...
首页> 外文期刊>Journal of Low Power Electronics >Optimizing Memory Efficiency for Deep Convolutional Neural Network Accelerators
【24h】

Optimizing Memory Efficiency for Deep Convolutional Neural Network Accelerators

机译:优化深度卷积神经网络加速器的内存效率

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Convolutional Neural Network (CNN) accelerators have achieved nominal performance and energy efficiency speedup compared to traditional general purpose CPU- and GPU-based solutions. Although optimizations on computation have been intensively studied, the energy efficiency of such acceleratorsremains limited by off-chip memory accesses since their energy cost is magnitudes higher than other operations. Minimizing off-chip memory access volume, therefore, is the key to further improving energy efficiency. The prior state-of-the-art uses rigid data reuse patterns and is sub-optimalfor some, or even all, of the individual convolutional layers. To overcome the problem, this paper proposed an adaptive layer partitioning and scheduling scheme, called SmartShuttle, to minimize off-chip memory accesses for CNN accelerators. Smartshuttle can adaptively switch among differentdata reuse schemes and the corresponding tiling factor settings to dynamically match different convolutional layers and fully-connected layers. Moreover, SmartShuttle thoroughly investigates the impact of data reusability and sparsity on the memory access volume. The experimental results showthat SmartShuttle processes the convolutional layers at 434.8 multiply and accumulations (MACs)/DRAM access for VGG16 (batch size = 3), and 526.3 MACs/DRAM access for AlexNet (batch size = 4), which outperforms the state-of-the-art approach (Eyeriss) by 52.2% and 52.6%, respectively.
机译:与传统的通用CPU和GPU的解决方案相比,卷积神经网络(CNN)加速器已经实现了名义性能和能效加速。尽管已经集中研究了对计算的优化,但是由于它们的能量成本是高于其他操作的芯片存储器访问的这种加速分析者的能效。因此,最大限度地减少了外部内存访问量,是进一步提高能源效率的关键。现有技术使用刚性数据重用模式,并且是各个卷​​积层的一些,甚至全部的子最佳。为了克服这个问题,本文提出了一种称为SmartShuttle的自适应层分区和调度方案,以最小化CNN加速器的芯片存储器访问。 SmartShuttle可以自适应地切换不同的数据重用方案和相应的平铺因子设置,以动态匹配不同的卷积层和完全连接的图层。此外,SmartShuttle彻底调查了数据可重用性和稀疏对存储器访问量的影响。实验结果显示SmartShuttle在vgg16(批量大小= 3)的434.8乘法和累积(Mac)/ DRAM访问中处理卷积层,以及用于AlexNet(批量大小= 4)的526.3 Macs / DRAM访问,这胜过状态 - - 艺术方法(Eyeriss)分别为52.2%和52.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号