Exploration of Memory Access Optimization for FPGA-based 3D CNN Accelerator

机译：基于FPGA的3D CNN加速器内存访问优化的探索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Three-dimensional convolutional networks (3D CNNs) are used efficiently in various video recognition applications. Compared to traditional 2D CNNs, extra temporal dimension causes 3D CNNs more computationally intensive and to have a larger memory footprint. Therefore, the memory optimization is extremely crucial in this case. This paper presents a design space exploration of memory access optimization for FPGA-based 3D CNN accelerator. We present a non-overlapping data tiling method for contiguous off-chip memory access and explore on-chip data reuse opportunity by leveraging different loop ordering strategies. We propose a hardware architecture design which can flexibly support different loop ordering strategies for each 3D CNN layer. With the help of hardware/software co-design, we can provide the optimal configuration toward an energy-efficient and high-performance accelerator design. According to the experiments on AlexNet, VGG16, and C3D, our optimal model reduces up to 84% DRAM accesses and 55% energy consumption on C3D compared to a baseline model, and demonstrates state-of-the-art performance compared to prior FPGA implementations.

机译：三维卷积网络（3D CNN）在各种视频识别应用中得到有效利用。与传统的2D CNN相比，额外的时间维度会导致3D CNN的计算量更大并且具有更大的内存占用量。因此，在这种情况下，内存优化至关重要。本文提出了基于FPGA的3D CNN加速器的存储器访问优化的设计空间探索。我们提出了一种用于连续片外存储器访问的非重叠数据切片方法，并通过利用不同的循环排序策略来探索片上数据重用的机会。我们提出了一种硬件体系结构设计，可以针对每个3D CNN层灵活地支持不同的循环排序策略。借助硬件/软件协同设计，我们可以为节能高效的加速器设计提供最佳配置。根据AlexNet，VGG16和C3D上的实验，与基准模型相比，我们的最佳模型在C3D上最多减少了84％的DRAM访问和55％的能耗，并展示了与先前的FPGA实现相比的最新性能。。

著录项

来源
《Design, Automation and Test in Europe Conference and Exhibition》|2020年|1650-1655|共6页
会议地点
作者
Teng Tian; Xi Jin; Letian Zhao; Xiaotian Wang; Jie Wang; Wei Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
3D CNN; data tiling; loop ordering; energy-efficient;

机译：3D CNN;数据切片;循环排序;节能;

相似文献

外文文献
中文文献
专利

1. DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators [J] . Xing Yu, Liang Shuang, Sui Lingzhi, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：DNNVM：端到端编译器利用基于FPGA的CNN加速器上的异构优化
2. An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick [J] . Dinelli Gianmarco, Meoni Gabriele, Rapuano Emilio, International journal of reconfigurable computing . 2019,第PTa1期

机译：仅用于CNN的FPGA的硬件加速器，仅使用片上存储器：设计和基准与英特尔Movidius神经计算棒
3. An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick [J] . Gianmarco Dinelli, Gabriele Meoni, Emilio Rapuano, International journal of reconfigurable computing . 2019,第5aaPagea2期

机译：仅用于CNN的FPGA的硬件加速器，仅使用片上存储器：设计和基准与英特尔Movidius神经计算棒
4. Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory [C] . Chao Jiang, David Ojika, Bhavesh Patel, IEEE Annual International Symposium on Field-Programmable Custom Computing Machines . 2021

机译：基于FPGA的深度学习加速器，用于使用高带宽存储器的稀疏CNN
5. Artificial Neural Network Optimizations for FPGA-Based Accelerators: Exploration of Low Numeric Precision, Sparsity, and Evolutionary Algorithms [D] . Colangelo, Philip . 2020

机译：基于FPGA的促进者的人工神经网络优化：低数字精度，稀疏性和进化算法的探索
6. Families of FPGA-Based Accelerators for Approximate String Matching [O] . Tom Van Court, Martin C. Herbordt -1

机译：基于FPGA的加速器家族用于近似字符串匹配
7. Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model [O] . SAYED OMID AYAT, MOHAMED KHALIL-HANI, AB AL-HADI AB RAHMAN 2018

机译：优化基于FPGA的CNN加速器，以扩展屋顶线模型的能效

Exploration of Memory Access Optimization for FPGA-based 3D CNN Accelerator

摘要

著录项

相似文献

相关主题

期刊订阅