首页> 外文会议>Design, Automation amp;amp;amp;amp;amp;amp; Test in Europe Conference amp;amp;amp;amp;amp;amp; Exhibition >Data Locality Optimization of Depthwise Separable Convolutions for CNN Inference Accelerators
【24h】

Data Locality Optimization of Depthwise Separable Convolutions for CNN Inference Accelerators

机译:用于CNN推理加速器的深度可分离卷曲的数据位置优化

获取原文

摘要

This paper presents a novel framework to maximize the data reusability in the depthwise separable convolutional layers with the Scan execution order of the tiled matrix multiplications. In addition, the fusion scheme across layers is proposed to minimize the data transfer of the intermediate activations, improving both the latency and energy consumption from the external memory accesses. The experimental results are validated against DRAMSim2 for the accurate timing and energy estimation. With a 64K-entry on-chip buffer, our approach can achieve the DRAM energy reduction of 67% on MobileNet V2.
机译:本文提出了一种新颖的框架,可以通过瓷砖矩阵乘法的扫描执行顺序最大化深度可分离卷积层中的数据可重用性。另外,提出了跨层的融合方案来最小化中间激活的数据传输,从外部存储器访问中提高延迟和能量消耗。实验结果针对DRAMSIM2验证了准确的计时和能量估计。使用64K入口的片上缓冲器,我们的方法可以在MobileNet V2上实现67%的DRAM能量降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号