首页> 外文会议>AAAI Conference on Artificial Intelligence >DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks
【24h】

DARB: A Density-Adaptive Regular-Block Pruning for Deep Neural Networks

机译:DARB:一种用于深度神经网络的密度自适应规则块剪枝

获取原文
获取外文期刊封面目录资料

摘要

The rapidly growing parameter volume of deep neural networks (DNNs) hinders the artificial intelligence applications on resource constrained devices, such as mobile and wearable devices. Neural network pruning, as one of the mainstream model compression techniques, is under extensive study to reduce the model size and thus the amount of computation. And thereby, the state-of-the-art DNNs are able to be deployed on those devices with high runtime energy efficiency. In contrast to irregular pruning that incurs high index storage and decoding overhead, structured pruning techniques have been proposed as the promising solutions. However, prior studies on structured pruning tackle the problem mainly from the perspective of facilitating hardware implementation, without diving into the deep to analyze the characteristics of sparse neural networks. The neglect on the study of sparse neural networks causes inefficient trade-off between regularity and pruning ratio. Consequently, the potential of structurally pruning neural networks is not sufficiently mined. In this work, we examine the structural characteristics of the irregularly pruned weight matrices, such as the diverse redundancy of different rows, the sensitivity of different rows to pruning, and the position characteristics of retained weights. By leveraging the gained insights as a guidance, we first propose the novel block-max weight masking (BMWM) method, which can effectively retain the salient weights while imposing high regularity to the weight matrix. As a further optimization, we propose a density-adaptive regular-block (DARB) pruning that can effectively take advantage of the intrinsic characteristics of neural networks, and thereby outperform prior structured pruning work with high pruning ratio and decoding efficiency. Our experimental results show that DARB can achieve 13 × to 25 × pruning ratio, which are 2.8 × to 4.3 × improvements than the state-of-the-art counterparts on multiple neural network models and tasks. Moreover, DARB can achieve 14.3× decoding efficiency than block pruning with higher pruning ratio.
机译:深度神经网络(deep neural networks,DNNs)参数量的快速增长阻碍了人工智能在移动和可穿戴设备等资源受限设备上的应用。神经网络剪枝作为一种主流的模型压缩技术,正在进行广泛的研究,以减少模型大小,从而减少计算量。因此,最先进的DNN能够以高运行时能效部署在这些设备上。与导致高索引存储和解码开销的不规则剪枝不同,结构化剪枝技术被认为是有前途的解决方案。然而,以前关于结构化剪枝的研究主要是从便于硬件实现的角度来解决这个问题,而没有深入分析稀疏神经网络的特性。对稀疏神经网络研究的忽视导致了规则性和剪枝率之间的低效权衡。因此,结构修剪神经网络的潜力没有得到充分挖掘。在这项工作中,我们研究了不规则修剪权重矩阵的结构特征,例如不同行的不同冗余度、不同行对修剪的敏感性,以及保留权重的位置特征。通过利用获得的见解作为指导,我们首先提出了新的块最大权重掩蔽(BMWM)方法,该方法可以有效地保留显著权重,同时对权重矩阵施加高规则性。作为进一步的优化,我们提出了一种密度自适应规则块(DARB)剪枝方法,该方法可以有效地利用神经网络的固有特性,从而在高剪枝率和解码效率的情况下优于先前的结构化剪枝方法。我们的实验结果表明,在多个神经网络模型和任务上,DARB可以达到13倍到25倍的修剪率,比最先进的同类算法提高了2.8倍到4.3倍。此外,在更高的剪枝率下,DARB比块剪枝能达到14.3倍的解码效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号