...
首页> 外文期刊>Neurocomputing >Fully integer-based quantization for mobile convolutional neural network inference
【24h】

Fully integer-based quantization for mobile convolutional neural network inference

机译:基于整数的移动卷积神经网络推论量化

获取原文
获取原文并翻译 | 示例
           

摘要

Deploying deep convolutional neural networks on mobile devices is challenging because of the conflict between their heavy computational overhead and the hardware's restricted computing capacity. Network quantization is typically used to alleviate this problem. However, we found that a "datatype mismatch" issue in existing low bitwidth quantization approaches can generate severe instruction redundancy, dramatically reducing their running efficiency on mobile devices. We therefore propose a novel quantization approach which ensures that only integer-based arithmetic is needed during the inference stage of the quantized model. To this end, we improved the quantization function to compel the quantized value to follow a standard integer format. Then we presented to simultaneously quantize the batch normalization parameters by a logarithm-like method. By doing so, the quantized model can keep the advantage of low bitwidth representation, while preventing the occurrence of "datatype mismatch" issue and corresponding instruction redundancy. Comprehensive experiments show that our method can achieve comparable prediction accuracy to other state-of-the-art methods while reducing the run-time latency by a large margin. Our fully integer-based quantized Resnet-18 has 4-bit weights, 4-bit activations and only a 0.7% top-1 and 0.4% top-5 accuracy drop on the ImageNet dataset. The assembly language implementation of a series of building blockscan reach a maximum of 4.33x the speed of the original full-precision version on an ARMv8 CPU. (c) 2020 Elsevier B.V. All rights reserved.
机译:由于其繁重的计算开销与硬件的限制计算能力之间的冲突,部署了在移动设备上的深度卷积神经网络是具有挑战性的。网络量化通常用于缓解此问题。但是,我们发现,在现有的低位宽量化方法中的“数据类型不匹配”问题可以产生严重的指令冗余,从而大大降低了移动设备上的运行效率。因此,我们提出了一种新的量化方法,其确保在量化模型的推断阶段期间只需要基于整数的算法。为此,我们改进了量化函数以迫使量化值遵循标准整数格式。然后我们介绍了通过像样的方法同时量化批量归一化参数。通过这样做,量化模型可以保持低位宽度表示的优势,同时防止发生“数据类型不匹配”问题和相应的指令冗余。综合实验表明,我们的方法可以实现与其他最先进的方法相当的预测精度,同时通过大边距降低运行时间延迟。我们完全基于整数的量化Reset-18具有4位权重,4位激活,并且在ImageNet数据集上仅为0.7%的顶级1和0.4%的前5个精度下降。一系列建筑群落的汇编语言实现最多达到ARMv8 CPU上的原始全精密版本的速度为4.33x。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Neurocomputing》 |2021年第7期|194-205|共12页
  • 作者单位

    Tongji Univ Coll Elect & Informat Engn Dept Control Sci & Engn Shanghai 201804 Peoples R China;

    Tongji Univ Coll Elect & Informat Engn Dept Control Sci & Engn Shanghai 201804 Peoples R China|Tongji Univ Shanghai Inst Intelligent Sci & Technol Shanghai 201804 Peoples R China;

    Tongji Univ Coll Elect & Informat Engn Dept Control Sci & Engn Shanghai 201804 Peoples R China;

    Tongji Univ Coll Elect & Informat Engn Dept Control Sci & Engn Shanghai 201804 Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Convolutional neural network; Quantization; Model compression; Deep learning;

    机译:卷积神经网络;量化;模型压缩;深入学习;
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号