OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

【24h】

OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

机译：OPU：用于卷积神经网络的基于FPGA的覆盖处理器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Field-programmable gate array (FPGA) provides rich parallel computing resources with high energy efficiency, making it ideal for deep convolutional neural network (CNN) acceleration. In recent years, automatic compilers have been developed to generate network-specific FPGA accelerators. However, with more cascading deep CNN algorithms adapted by various complicated tasks, reconfiguration of FPGA devices during runtime becomes unavoidable when network-specific accelerators are employed. Such reconfiguration can be difficult for edge devices. Moreover, network-specific accelerator means regeneration of RTL code and physical implementation whenever the network is updated. This is not easy for CNN end users. In this article, we propose a domain-specific FPGA overlay processor, named OPU to accelerate CNN networks. It offers software-like programmability for CNN end users, as CNN algorithms are automatically compiled into executable codes, which are loaded and executed by OPU without reconfiguration of FPGA for switch or update of CNN networks. Our OPU instructions have complicated functions with variable runtimes but a uniform length. The granularity of instruction is optimized to provide good performance and sufficient flexibility, while reducing complexity to develop microarchitecture and compiler. Experiments show that OPU can achieve an average of 91% runtime multiplication and accumulation unit (MAC) efficiency (RME) among nine different networks. Moreover, for VGG and YOLO networks, OPU outperforms automatically compiled network-specific accelerators in the literature. In addition, OPU shows 5.35x better power efficiency compared with Titan Xp. For a real-time cascaded CNN networks scenario, OPU is 2.9x faster compared with edge computing GPU Jetson Tx2, which has a similar amount of computing resources.

机译：现场可编程门阵列（FPGA）提供了具有高能效的丰富并行计算资源，使其成为深度卷积神经网络（CNN）加速的理想选择。近年来，已经开发了自动编译器来生成特定于网络的FPGA加速器。但是，随着更多级联的深层CNN算法适应各种复杂任务，当使用网络专用加速器时，在运行期间重新配置FPGA设备变得不可避免。对于边缘设备，这样的重新配置可能是困难的。此外，特定于网络的加速器意味着每当网络更新时，RTL代码的再生和物理实现。对于CNN最终用户而言，这并不容易。在本文中，我们提出了一种域专用的FPGA覆盖处理器，称为OPU，以加速CNN网络。它为CNN最终用户提供了类似于软件的可编程性，因为CNN算法被自动编译成可执行代码，由OPU加载和执行，而无需重新配置FPGA来切换或更新CNN网络。我们的OPU指令具有复杂的功能，具有可变的运行时间，但长度均匀。优化了指令的粒度，以提供良好的性能和足够的灵活性，同时降低了开发微体系结构和编译器的复杂性。实验表明，在9个不同的网络中，OPU可以平均实现91％的运行时乘法和累加单位（MAC）效率（RME）。此外，对于VGG和YOLO网络，OPU的性能优于文献中自动编译的特定于网络的加速器。此外，与Titan Xp相比，OPU的电源效率提高了5.35倍。对于实时级联CNN网络场景，与具有类似计算资源量的边缘计算GPU Jetson Tx2相比，OPU快2.9倍。

著录项

来源
《IEEE transactions on very large scale integration (VLSI) systems》 |2020年第1期|35-47|共13页
作者

展开▼
作者单位

Univ Calif Los Angeles Dept Elect & Comp Engn Los Angeles CA 90095 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Convolutional neural network (CNN) overlay processor; field-programmable gate array (FPGA) acceleration; hardware-software codesign;

机译：卷积神经网络（CNN）覆盖处理器;现场可编程门阵列（FPGA）加速;软硬件代码签名;

相似文献

外文文献
中文文献
专利

1. Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks [J] . Yu Yunxuan, Zhao Tiandong, Wang Mingyu, IEEE transactions on very large scale integration (VLSI) systems . 2020,第7期

机译：UNI-OPU：基于FPGA的统一加速器，用于卷积和转置卷积网络
2. FFConv: An FPGA-based Accelerator for Fast Convolution Layers in Convolutional Neural Networks [J] . AFZAL AHMAD, MUHAMMAD ADEEL PASHA ACM Transactions on Embedded Computing Systems . 2020,第2期

机译：FFCONV：卷积神经网络中的快速卷积层的基于FPGA的加速器
3. An FPGA-Based Convolutional Neural Network Coprocessor [J] . Changpei Qiu, Xin’an Wang, Tianxia Zhao, Wireless communications & mobile computing . 2021,第a期

机译：基于FPGA的卷积神经网络协处理器
4. An FPGA-based processor for training convolutional neural networks [C] . Zhiqiang Liu, Yong Dou, Jingfei Jiang, International Conference on Field Programmable Technology . 2017

机译：用于训练卷积神经网络的基于FPGA的处理器
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing [O] . Maciej Wielgosz, Michał Karwatowski 2019

机译：将神经网络映射到基于FPGA的IoT设备以进行超低延迟处理
7. CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks [O] . Han, Xushen, Zhou, Dajiang, Wang, Shihao, 2017

机译：CNN-mERp：基于FpGa的内存高效可重配置处理器卷积神经网络的前向和后向传播

OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅