首页> 外文期刊>Microprocessors and microsystems >Throughput optimizations for FPGA-based deep neural network inference
【24h】

Throughput optimizations for FPGA-based deep neural network inference

机译:基于FPGA的深度神经网络推理的吞吐量优化

获取原文
获取原文并翻译 | 示例

摘要

Deep neural networks are an extremely successful and widely used technique for various pattern recognition and machine learning tasks. Due to power and resource constraints, these computationally intensive networks are difficult to implement in embedded systems. Yet, the number of applications that can benefit from the mentioned possibilities is rapidly rising. In this paper, we propose novel architectures for the inference of previously learned and arbitrary deep neural networks on FPGA-based SoCs that are able to overcome these limitations. Our key contributions include the reuse of previously transferred weight matrices across multiple input samples, which we refer to as batch processing, and the usage of compressed weight matrices, also known as pruning. An extensive evaluation of these optimizations is presented. Both techniques allow a significant mitigation of data transfers and speed-up the network inference by one order of magnitude. At the same time, we surpass the data throughput of fully-featured x86-based systems while only using a fraction of their energy consumption.
机译:深度神经网络是用于各种模式识别和机器学习任务的极其成功且广泛使用的技术。由于功率和资源的限制,这些计算密集型网络很难在嵌入式系统中实现。但是,可以从上述可能性中受益的应用程序数量正在迅速增加。在本文中,我们提出了新颖的体系结构,以推论能够克服这些局限性的基于FPGA的SoC上先前学习的任意深度神经网络。我们的主要贡献包括跨多个输入样本重用先前转移的权重矩阵(我们称为批处理),以及压缩权重矩阵的使用(也称为修剪)。提出了对这些优化的广泛评估。两种技术都可以极大地减轻数据传输的负担,并将网络推断速度提高一个数量级。同时,我们仅使用能量消耗的一小部分就超过了功能齐全的基于x86的系统的数据吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号