Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks

Lai Bo-Cheng; Pan Jyun-Wei; Lin Chien-Yu

首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks

【24h】

Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks

机译：类似于SIMD的加速器在稀疏卷积神经网络中的利用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Although the existing single-instruction-multiple-data-like (SIMD) accelerators can handle the compressed format of sparse convolutional neural networks, the sparse and irregular distributions of nonzero elements cause low utilization of multipliers in a processing engine (PE) and imbalanced computation between PEs. This brief addresses the above issues by proposing a data screening and task mapping (DSTM) accelerator which integrates a series of techniques, including software refinement and hardware modules. An efficient indexing module is introduced to identify the effectual computation pairs and skip unnecessary computation in a fine-grained manner. The intra-PE load imbalance is alleviated with weight data rearrangement. An effective task sharing mechanism further balances the computation between PEs. When compared with the state-of-the-art SIMD-like accelerator, the proposed DSTM enhances the average PE utilization by 3.5x. The overall processing throughput is 59.7% higher than the previous design.

机译：尽管现有的单指令多数据类（SIMD）加速器可以处理稀疏卷积神经网络的压缩格式，但非零元素的稀疏和不规则分布会导致处理引擎（PE）中乘法器的利用率低和计算不平衡PE之间。本简介通过提出一种数据筛选和任务映射（DSTM）加速器来解决上述问题，该加速器集成了一系列技术，包括软件优化和硬件模块。引入了高效的索引模块，以识别有效的计算对并以细粒度的方式跳过不必要的计算。通过重量数据重新安排，可以减轻PE内部的负载不平衡。有效的任务共享机制进一步平衡了PE之间的计算。与最先进的类似SIMD的加速器相比，建议的DSTM将平均PE利用率提高了3.5倍。总体处理吞吐量比以前的设计高59.7％。

著录项

来源
《IEEE transactions on very large scale integration (VLSI) systems》 |2019年第5期|1218-1222|共5页
作者
Lai Bo-Cheng; Pan Jyun-Wei; Lin Chien-Yu;
展开▼
作者单位

Natl Chiao Tung Univ, Inst Elect, Hsinchu 30010, Taiwan;

Natl Chiao Tung Univ, Inst Elect, Hsinchu 30010, Taiwan;

Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Load balance; machine learning; single-instruction-multiple-data (SIMD) architecture; sparse convolutional neural networks (CNNs);

机译：负载平衡;机器学习;单指令 - 多数据（SIMD）架构;稀疏卷积神经网络（CNNS）;

相似文献

外文文献
中文文献
专利

1. Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks [J] . Lai Bo-Cheng, Pan Jyun-Wei, Lin Chien-Yu IEEE transactions on very large scale integration (VLSI) systems . 2019,第5期

机译：加强利用SIMD加速器的稀疏卷积神经网络
2. WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm [J] . Wang Xuan, Wang Chao, Cao Jing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第11期

机译：WINONN：使用稀疏Winograd算法优化基于FPGA的卷积神经网络加速器
3. SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators [J] . Masayuki SHIMODA, Youki SADA, Ryosuke KURAMOCHI, IEICE transactions on information and systems . 2020,第12期

机译：Sentei：通过蒸馏到有效的稀疏卷积神经网络加速器的滤波器 - 明智的修剪
4. Supporting compressed-sparse activations and weights on SIMD-like accelerator for sparse convolutional neural networks [C] . Chien-Yu Lin, Bo-Cheng Lai Asia and South Pacific Design Automation Conference . 2018

机译：在稀疏卷积神经网络的类SIMD加速器上支持压缩稀疏激活和权重
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. Sparse Convolutional Neural Networks for Genome-Wide Prediction [O] . Patrik Waldmann, Christina Pfeiffer, Gábor Mészáros 2020

机译：全基因组预测的稀疏卷积神经网络
7. Improving Memory Utilization in Convolutional Neural Network Accelerators [O] . Petar Jokic, Stephane Emery, Luca Benini 2020

机译：提高卷积神经网络加速器中的内存利用

Enhancing Utilization of SIMD-Like Accelerator for Sparse Convolutional Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅