Throughput optimizations for FPGA-based deep neural network inference

Posewsky Thorbjoern; Ziener Daniel

首页> 外文期刊>Microprocessors and microsystems >Throughput optimizations for FPGA-based deep neural network inference

【24h】

Throughput optimizations for FPGA-based deep neural network inference

机译：基于FPGA的深度神经网络推理的吞吐量优化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose novel architectures for the inference of previously learned and arbitrary deep neural networks on FPGA-based SoCs that are able to overcome these limitations. Our key contributions include the reuse of previously transferred weight matrices across multiple input samples, which we refer to as batch processing, and the usage of compressed weight matrices, also known as pruning. An extensive evaluation of these optimizations is presented. Both techniques allow a significant mitigation of data transfers and speed-up the network inference by one order of magnitude. At the same time, we surpass the data throughput of fully-featured x86-based systems while only using a fraction of their energy consumption.

机译：深度神经网络是用于各种模式识别和机器学习任务的极其成功且广泛使用的技术。由于功率和资源的限制，这些计算密集型网络很难在嵌入式系统中实现。但是，可以从上述可能性中受益的应用程序数量正在迅速增加。在本文中，我们提出了新颖的体系结构，以推论能够克服这些局限性的基于FPGA的SoC上先前学习的任意深度神经网络。我们的主要贡献包括跨多个输入样本重用先前转移的权重矩阵（我们称为批处理），以及压缩权重矩阵的使用（也称为修剪）。提出了对这些优化的广泛评估。两种技术都可以极大地减轻数据传输的负担，并将网络推断速度提高一个数量级。同时，我们仅使用能量消耗的一小部分就超过了功能齐全的基于x86的系统的数据吞吐量。

著录项

来源
《Microprocessors and microsystems》 |2018年第7期|151-161|共11页
作者
Posewsky Thorbjoern; Ziener Daniel;
展开▼
作者单位

Hamburg Univ Technol TUHH, Inst Embedded Syst, D-21073 Hamburg, Germany;

Univ Twente, Comp Architectures Embedded Syst, NL-7500 AE Enschede, Netherlands;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep neural networks; Batch processing; Pruning; Compression; FPGA; Inference; Throughput optimizations; Fully-connected;

机译：深度神经网络;批处理;修剪;压缩;FPGA;推理;吞吐量优化;完全连接;

相似文献

外文文献
中文文献
专利

1. Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks [J] . Liu Zhiqiang, Dou Yong, Jiang Jingfei, ACM transactions on reconfigurable technology and systems . 2017,第3期

机译：用于深度卷积神经网络的吞吐量优化的FPGA加速器
2. Optimizing deep learning inference on mobile devices with neural network accelerators [J] . Zeng Xi, Xu Yunlong, Zhi Tian 高技术通讯（英文版） . 2019,第004期

机译：使用神经网络加速器优化移动设备上的深度学习推理
3. FPGA-based neural network software gives GPUs competition for raw inference speed [J] . Dennis Scimeca Vision Systems Design . 2021,第3期

机译：基于FPGA的神经网络软件为原始推理速度提供GPU竞争
4. A Flexible FPGA-Based Inference Architecture for Pruned Deep Neural Networks [C] . Thorbjoern Posewsky, Daniel Ziener International conference on architecture of computing systems . 2018

机译：修剪深度神经网络的基于FPGA的灵活推理架构
5. A framework for FPGA-based acceleration of neural network inference with limited numerical precision via high-level synthesis with streaming functionality. [D] . Lian, Ruo Long. 2016

机译：通过具有流功能的高级综合，以有限的数值精度实现基于FPGA的神经网络推理加速的框架。
6. Measuring the Uncertainty of Predictions in Deep Neural Networks with Variational Inference [O] . Jan Steinbrener, Konstantin Posch, Jürgen Pilz 2020

机译：测量具有变分推理的深神经网络预测的不确定性
7. Throughput optimizations for FPGA-based deep neural network inference [O] . Thorbjörn Posewsky, Daniel Ziener 2018

机译：基于FPGA的深神经网络推断的吞吐量优化

Throughput optimizations for FPGA-based deep neural network inference

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅