首页> 外文会议>IEEE International Conference on Soft Computing and Machine Intelligence >GPQ: Greedy Partial Quantization of Convolutional Neural Networks Inspired by Submodular Optimization
【24h】

GPQ: Greedy Partial Quantization of Convolutional Neural Networks Inspired by Submodular Optimization

机译:GPQ:由子模块优化启发的卷积神经网络的贪婪部分量化

获取原文

摘要

Recent work has revealed that the effects of neural network quantization on inference accuracy are different for each layer. Therefore, partial quantization and mixed precision quantization have been studied for neural network accelerators with multi-precision designs. However, these quantization methods generally require network training that entails a high computational cost or exhibit a significant loss of inference accuracy. In this paper, we propose a greedy search algorithm for partial quantization that can derive optimal combinations of quantization layers; notably, the proposed method exhibits a low computational complexity, O(N2) (N denotes the number of layers). The proposed Greedy Partial Quantization (GPQ) achieved 4.2 × model size compression with only -0.03% accuracy loss in ResNet50 and 2.5× compression with +0.015% accuracy gain in Xception. The computational cost of GPQ is only 2.5 GPU-hours in the case of EfficientNet-B0 8-bit quantization for ImageNet classification.
机译:最近的工作透露,对于每层的技术对推理精度的影响对推理精度的影响是不同的。因此,针对具有多精度设计的神经网络加速器研究了部分量化和混合精度量化。然而,这些量化方法通常需要需要高计算成本或表现出显着的推理准确性损失的网络训练。在本文中,我们提出了一种贪婪的搜索算法,用于部分量化,可以推导出量化层的最佳组合;值得注意的是,所提出的方法表现出低计算复杂度O(n 2 )(n表示图层的数量)。所提出的贪婪偏量化(GPQ)实现4.2×模型大小压缩与ResNet50仅-0.03%的准确度损失和2.5×压缩与Xception + 0.015%的精度增益。 GPQ的计算成本在有效的网络分类的效率-B0 8位量化的情况下,GPQ的计算成本仅为2.5 GPU - 小时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号