首页> 外文会议>Design, Automation and Test in Europe Conference and Exhibition >Exploration of Memory Access Optimization for FPGA-based 3D CNN Accelerator
【24h】

Exploration of Memory Access Optimization for FPGA-based 3D CNN Accelerator

机译:基于FPGA的3D CNN加速器内存访问优化的探索

获取原文

摘要

Three-dimensional convolutional networks (3D CNNs) are used efficiently in various video recognition applications. Compared to traditional 2D CNNs, extra temporal dimension causes 3D CNNs more computationally intensive and to have a larger memory footprint. Therefore, the memory optimization is extremely crucial in this case. This paper presents a design space exploration of memory access optimization for FPGA-based 3D CNN accelerator. We present a non-overlapping data tiling method for contiguous off-chip memory access and explore on-chip data reuse opportunity by leveraging different loop ordering strategies. We propose a hardware architecture design which can flexibly support different loop ordering strategies for each 3D CNN layer. With the help of hardware/software co-design, we can provide the optimal configuration toward an energy-efficient and high-performance accelerator design. According to the experiments on AlexNet, VGG16, and C3D, our optimal model reduces up to 84% DRAM accesses and 55% energy consumption on C3D compared to a baseline model, and demonstrates state-of-the-art performance compared to prior FPGA implementations.
机译:三维卷积网络(3D CNN)在各种视频识别应用中得到有效利用。与传统的2D CNN相比,额外的时间维度会导致3D CNN的计算量更大并且具有更大的内存占用量。因此,在这种情况下,内存优化至关重要。本文提出了基于FPGA的3D CNN加速器的存储器访问优化的设计空间探索。我们提出了一种用于连续片外存储器访问的非重叠数据切片方法,并通过利用不同的循环排序策略来探索片上数据重用的机会。我们提出了一种硬件体系结构设计,可以针对每个3D CNN层灵活地支持不同的循环排序策略。借助硬件/软件协同设计,我们可以为节能高效的加速器设计提供最佳配置。根据AlexNet,VGG16和C3D上的实验,与基准模型相比,我们的最佳模型在C3D上最多减少了84%的DRAM访问和55%的能耗,并展示了与先前的FPGA实现相比的最新性能。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号