首页> 外文会议>IEEE International Conference on Artificial Intelligence Circuits and Systems >HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning
【24h】

HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning

机译:HPPU:具有混合体重修剪的节能稀疏DNN训练处理器

获取原文

摘要

Enlightened by the fact that deep-neural-networks (DNNs) are typically highly over-parameterized, weight-pruning-based sparse training (ST) becomes a practical method to reduce training computation and compress models. However, the previous pruning algorithms are either with a coarse-grained pattern or a fine-grained pattern. They lead to a limited pruning ratio or a drastically irregular sparsity distribution, which is computation-intensive or logic-complex for hardware implementation. Meanwhile, the current DNN processors focus on sparse inference but cannot support emerging ST techniques. This paper proposes a co-design approach where the algorithm is adapted to suit the hardware constraints and the hardware exploit the algorithm property to accelerate sparse training. We first present a novel pruning algorithm, hybrid weight pruning, including channel-wise and line-wise pruning. It reaches a considerable pruning ratio while maintaining the hardware friendly property. Then we design a hardware architecture, Hybrid Pruning Processing Unit, HPPU, to accelerate the proposed algorithm. It develops a 2-level active data selector and a sparse convolution engine, which maximize hardware utilization when handling the hybrid sparsity patterns during training. We evaluate HPPU by synthesizing it with 28nm CMOS technology. HPPU achieves 50.1% higher pruning ratio than coarse-grained pruning and 1.53× higher energy-efficiency than fine-grained pruning. The peak energy-efficiency of HPPU is 126.04TFLOPs/W, outperforming state-of-the-art trainable processor GANPU 1.67×. When training a ResNet18 model, HPPU consumes 3.72× less energy and offers 4.69× speedup, and maintains unpruned accuracy.
机译:深度神经网络(DNN)通常高度参数化的事实,基于重量灌注的稀疏训练(ST)成为减少培训计算和压缩模型的实用方法。然而,先前的修剪算法具有粗粒图案或细粒度的图案。它们导致有限的修剪比率或急性不规则的稀疏分布,这是硬件实现的计算密集型或逻辑复合体。同时,当前的DNN处理器专注于稀疏推理,但不能支持新兴的ST技术。本文提出了一种协同设计方法,其中算法适用于满足硬件约束和硬件利用算法属性以加速稀疏训练。我们首先提出了一种新颖的修剪算法,混合体重修剪,包括通道 - 明智和线路修剪。它达到了相当大的修剪比率,同时保持硬件友好的财产。然后我们设计硬件架构,混合修剪处理单元,HPPU,以加速所提出的算法。它开发了2级活动数据选择器和稀疏卷积引擎,可在培训期间处理混合稀疏模式时最大化硬件利用率。通过用28nm CMOS技术合成它来评估HPPU。 HPPU达到50.1%的修剪比率比粗粒粒度的修剪率高,能量效率高于细粒度的修剪。 HPPU的峰值节能效率为126.04TFLOPS / W,表现优于最先进的培训处理器Ganpu 1.67×。在培训Reset18模型时,HPPU消耗3.72倍的能量,并提供4.69倍的加速,并保持无偿精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号