首页> 外文会议>The 24th IEEE International Symposium on Field-Programmable Custom Computing Machines >FPGA-Based Reduction Techniques for Efficient Deep Neural Network Deployment
【24h】

FPGA-Based Reduction Techniques for Efficient Deep Neural Network Deployment

机译:基于FPGA的还原技术可实现高效的深度神经网络部署

获取原文
获取原文并翻译 | 示例

摘要

Deep neural networks have been shown to outperform prior state-of-the-art solutions that often relied heavily on hand-engineered feature extraction techniques coupled with simple classification algorithms. In particular, deep max-pooling convolutional neural networks (MPCNN) have been shown to dominate on several popular public benchmarks. Unfortunately, the benefits of deep networks have yet to be exploited in embedded, resource-bound settings that have strict power and area budgets. GPUs have been shown to improve throughput and energy-efficiency over CPUs due to their parallel architecture. In a similar fashion, FPGAs can improve performance while allowing more fine control over implementation. In order to meet power, area, and latency constraints, it is necessary to develop network reduction strategies in addition to optimal mapping. This work looks at two specific reduction techniques including limited precision for both fixed-point and floating-point formats, and performing weight matrix truncation using singular value decomposition. An FPGA-based framework is also proposed and used to deploy the trained networks. To demonstrate, a handful of public computer vision datasets including MNIST, CIFAR-10, and SVHN are fully implemented on a low-power Xilinx Artix-7 FPGA. Experimental results show that all networks are able to achieve a classification throughput of 16 img/sec and consume less than 700 mW when running at 200 MHz. In addition, the reduced networks are able to, on average, reduce power and area utilization by 37% and 44%, respectively, while only incurring less than 0.20% decrease in accuracy.
机译:事实证明,深度神经网络的性能优于以前的最新解决方案,后者通常严重依赖于人工设计的特征提取技术以及简单的分类算法。特别是,深度最大池卷积神经网络(MPCNN)已显示出在几种流行的公共基准中占主导地位。不幸的是,深度网络的好处尚未在具有严格功率和区域预算的嵌入式,资源受限的环境中得到利用。由于GPU具有并行架构,因此已被证明可以提高CPU的吞吐量和能效。 FPGA以类似的方式可以提高性能,同时允许对实现进行更好的控制。为了满足功率,面积和等待时间的限制,除了最佳映射之外,还必须开发网络缩减策略。这项工作着眼于两种特定的归约技术,包括定点和浮点格式的有限精度,以及使用奇异值分解执行权重矩阵截断。还提出了一种基于FPGA的框架并将其用于部署经过训练的网络。为了说明这一点,在低功耗Xilinx Artix-7 FPGA上完全实现了包括MNIST,CIFAR-10和SVHN在内的少数公共计算机视觉数据集。实验结果表明,所有网络在200 MHz下运行时,都能实现16 img / sec的分类吞吐量,消耗的功率不到700 mW。此外,精简的网络平均可以分别将功率和面积利用率分别降低37%和44%,而准确度的降低仅不到0.20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号