Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning

Yang Wang; Yubin Qin; Dazheng DengJingchuan WeiTianbao ChenXinhan LinLeibo LiuShaojun WeiShouyi Yin

首页> 外文期刊>IEEE Journal of Solid-State Circuits >Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning

【24h】

Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning

机译：Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Transfer learning, which transfers knowledge from source datasets to target datasets, is practical for adaptive deep neural network (DNN) applications. When considering user privacy and communication bandwidth issues, edge devices’ training is essential for transfer learning. Nevertheless, training requires repeating feedforward (FF), backpropagation (BP), and weight gradient (WG) millions of times, introducing prohibitive computation for edge devices. A promising method to reduce training computation is sparse DNN training (SDT), which dynamically prunes weights during training iterations and performs FF, BP, and WG only with unpruned weights. However, SDT suffers implicit redundancy and reuse imbalance for convolution layers. Besides, it turns bottlenecks into batch normalization (BN) layers. Therefore, it is challenging to achieve energy-efficient SDT computing. This article proposes a processor, Trainer, solving the above challenges with three features. First, a speculation mechanism removes implicit redundant operations, which have nonzeros’ input, weight, or output, but are ineffective for training. Second, a dynamic sparsity adaptive dataflow tackles the reuse imbalance, improving energy efficiency (EE) for dynamic sparse convolution in SDT. Third, a computational dependence decoupled BN unit eliminates BN’s repeated data access to reduce training energy and time. Trainer is fabricated in 28-nm CMOS technology and occupies 20.96 mm 2 of area. It achieves a peak EE of 173.28TFLOPS/W@FP16 (276.55TFLOPS/W@FP8) for a 90% activation sparsity and 90% weight sparsity. The sparsity to EE conversion ratio is 80.9, outperforming the previous work by 1.55 $times $ . When training a ResNet18 model with SDT, Trainer reduces energy by 2.23 $times $ and time by 1.76 $times $ than the state-of-the-art sparse training processor.

著录项

来源
《IEEE Journal of Solid-State Circuits》 |2022年第10期|3164-3178|共15页
作者
Yang Wang; Yubin Qin; Dazheng DengJingchuan WeiTianbao ChenXinhan LinLeibo LiuShaojun WeiShouyi Yin;
展开▼
作者单位

TsingMicro Technology, Beijing, China;

School of Integrated Circuits, the Beijing Innovation Center for Future Chip, and the Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类基本电子电路;
关键词
Training; Computational modeling; Convolution; Redundancy; Transfer learning; Random access memory; Kernel;

Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning

摘要

著录项

相关主题

期刊订阅