Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

机译：混合和匹配：一种以FPGA为中心的深度神经网络量化框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning.Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions.For the FPGA implementations, we develop a parameterized architecture with heterogeneous Generalized Matrix Multiplication (GEMM) cores—one using LUTs for computations with SP2 quantized weights and the other utilizing DSPs for fixed-point quantized weights. Given the partition ratio among the two schemes based on resource characterization, MSQ quantization training algorithm derives an optimally quantized model for the FPGA implementation. We evaluate our FPGA-centric quantization framework across multiple application domains. With optimal SP2/fixed-point ratios on two FPGA devices, i.e., Zynq XC7Z020 and XC7Z045, we achieve performance improvement of 2.1 × -4.1 × compared to solely exploiting DSPs for all multiplication operations. In addition, the CNN implementations with the proposed MSQ scheme can achieve higher accuracy and comparable hardware utilization efficiency compared to the state-of-the-art designs.

机译：深度神经网络（DNN）在各种应用领域中取得了非凡的性能。为了支持多样化的DNN模型，广泛地研究了对边缘计算平台上DNN推断的高效实现，例如ASIC，FPGA和嵌入式系统。由于巨大的模型大小和计算量，模型压缩是在边缘设备上部署DNN模型的关键步骤。本文重点介绍了重量量化，一种硬件友好的模型压缩方法，它是重量灌注的互补。使用相同量化方案的互补方法，用于所有重量的方法，我们提出了为不同行的重量行应用不同量化方案的第一种解决方案矩阵。它具有（1）不同行中权重的分布不一样; （2）实现更好地利用异构FPGA硬件资源的可能性。为此，我们首先提出了一种硬件友好的量化方案，该方案名为-2（SP2）的总和适用于高斯的重量分布，其中乘法算术可以用逻辑移位器和加法器替换，从而实现具有FPGA LUT资源的高效实现。相反，现有的定点量化适用于均匀的重量分布，并且可以通过DSP有效地实现。然后，为了充分探索资源，我们提出了一种以FPGA为中心的混合方案量化（MSQ），具有所提出的SP2和定点方案的集合。组合两种方案可以维持，甚至增加精度，因为FPGA实现更好地匹配，我们使用LUTS与SP2量化权重的计算开发了具有异构通用矩阵乘法（Gemm）核心的参数化体系结构。其他利用DSP用于定点量化的重量。鉴于基于资源表征的两个方案中的分区比，MSQ量化训练算法导出了用于FPGA实现的最佳量化模型。我们在多个应用域中评估了我们的FPGA的量化框架。在两个FPGA器件上的最佳SP2 /定点比率，即Zynq XC7Z020和XC7Z045，我们实现了2.1×-4.1倍的性能提高，与所有乘法运算仅利用DSP。此外，与所提出的MSQ方案的CNN实现可以实现与最先进的设计相比的更高的精度和相当的硬件利用效率。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2021年|208-220|共13页
会议地点
作者
Sung-En Chang; Yanyu Li; Mengshu Sun; Runbin Shi; Hayden K.-H. So; Xuehai Qian; Yanzhi Wang; Xue Lin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Performance evaluation; Quantization (signal); Recurrent neural networks; Computational modeling; Hardware; Table lookup; Field programmable gate arrays;

机译：性能评估;量化（信号）;复发性神经网络;计算建模;硬件;表查找;现场可编程门阵列;

相似文献

外文文献
中文文献
专利

1. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks [J] . Blott Michaela, Preusser Thomas B., Fraser Nicholas J., ACM transactions on reconfigurable technology and systems . 2018,第3期

机译：FINN-R：快速探索量化神经网络的端到端深度学习框架
2. Memristive Quantized Neural Networks: A Novel Approach to Accelerate Deep Learning On-Chip [J] . Zhang Yang, Cui Menglin, Shen Linlin, Cybernetics, IEEE Transactions on . 2021,第4期

机译：回忆量化神经网络：一种加快芯片深度学习的新方法
3. Bit-Quantized-Net: An Effective Method for Compressing Deep Neural Networks [J] . Li Chunshan, Du Qing, Xu Xiaofei, Mobile networks & applications . 2021,第1期

机译：比特量化网络：压缩深神经网络的有效方法
4. Comparing Industry Frameworks with Deeply Quantized Neural Networks on Microcontrollers [C] . Danilo Pau, Marco Lattuada, Francesco Loro, IEEE International Conference on Consumer Electronics . 2021

机译：将行业框架与微控制器上深度量化的神经网络进行比较
5. Hardware for Quantized Mixed-Precision Deep Neural Networks [D] . Rios, Andres. 2021

机译：用于量化混合精密深神经网络的硬件
6. A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks [O] . Xin Long, XiangRong Zeng, Zongcheng Ben, 2020

机译：一种用于压缩深度神经网络的新型低位量化策略
7. on A New Side-Match Finite-State Vector Quantization Using Neural Networks for Image Coding 1 [O] . Yu-len Huang, Ruey-feng Chang 1998

机译：神经网络进行图像编码的新型边匹配有限状态矢量量化1

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

摘要

著录项

相似文献

相关主题

期刊订阅