Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

Zhisheng Wang; Jun Lin; Zhongfeng Wang

首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

【24h】

Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

机译：加速循环神经网络：一种内存有效的方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recurrent neural networks (RNNs) have achieved the state-of-the-art performance on various sequence learning tasks due to their powerful sequence modeling capability. However, RNNs usually require a large number of parameters and high computational complexity. Hence, it is quite challenging to implement complex RNNs on embedded devices with stringent memory and latency requirement. In this paper, we first present a novel hybrid compression method for a widely used RNN variant, long-short term memory (LSTM), to tackle these implementation challenges. By properly using circulant matrices, forward nonlinear function approximation, and efficient quantization schemes with a retrain-based training strategy, the proposed compression method can reduce more than 95% of memory usage with negligible accuracy loss when verified under language modeling and speech recognition tasks. An efficient scalable parallel hardware architecture is then proposed for the compressed LSTM. With an innovative chessboard division method for matrix-vector multiplications, the parallelism of the proposed hardware architecture can be freely chosen under certain latency requirement. Specifically, for the circulant matrix-vector multiplications employed in the compressed LSTM, the circulant matrices are judiciously reorganized to fit in with the chessboard division and minimize the number of memory accesses required for the matrix multiplications. The proposed architecture is modeled using register transfer language (RTL) and synthesized under the TSMC 90-nm CMOS technology. With 518.5-kB on-chip memory, we are able to process a 512×512 compressed LSTM in 1.71 μs, corresponding to 2.46 TOPS on the uncompressed one, at a cost of 30.77-mm chip area. The implementation results demonstrate that the proposed design can achieve significantly high flexibility and area efficiency, which satisfies many real-time applications on embedded devices. It is worth mentioning that the memory-efficient approach of accelerating LSTM developed in this paper is also applicable to other RNN variants.

机译：递归神经网络（RNN）具有强大的序列建模能力，因此已在各种序列学习任务上达到了最先进的性能。但是，RNN通常需要大量参数和高计算复杂性。因此，在具有严格内存和延迟要求的嵌入式设备上实现复杂的RNN颇具挑战性。在本文中，我们首先针对广泛使用的RNN变体长短期记忆（LSTM）提出了一种新颖的混合压缩方法，以解决这些实现难题。通过正确使用循环矩阵，正向非线性函数逼近和有效的量化方案以及基于再训练的训练策略，当在语言建模和语音识别任务下进行验证时，所提出的压缩方法可以减少95％以上的内存使用，而精度损失可忽略不计。然后为压缩的LSTM提出了一种有效的可扩展并行硬件体系结构。利用用于矩阵矢量乘法的创新棋盘分割方法，可以在一定的等待时间要求下自由选择所提出的硬件体系结构的并行性。具体来说，对于在压缩LSTM中使用的循环矩阵向量乘法，会明智地重新组织循环矩阵以适合棋盘划分，并最大程度地减少矩阵乘法所需的内存访问次数。所提出的架构使用寄存器传输语言（RTL）进行建模，并在台积电90纳米CMOS技术下进行了综合。利用518.5 kB的片上存储器，我们能够在1.71μs的时间内处理512×512压缩的LSTM，相当于未压缩的LSTM的2.46 TOPS，而芯片面积仅为30.77 mm。实施结果表明，所提出的设计可以实现很高的灵活性和面积效率，满足嵌入式设备上的许多实时应用。值得一提的是，本文开发的加速LSTM的内存有效方法也适用于其他RNN变体。

著录项

来源
《IEEE transactions on very large scale integration (VLSI) systems》 |2017年第10期|2763-2775|共13页
作者
Zhisheng Wang; Jun Lin; Zhongfeng Wang;
展开▼
作者单位

School of Electronic Science and Engineering, Nanjing University, Nanjing, China;

School of Electronic Science and Engineering, Nanjing University, Nanjing, China;

School of Electronic Science and Engineering, Nanjing University, Nanjing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Hardware; Memory management; Recurrent neural networks; Training; Logic gates;

机译：硬件;内存管理;递归神经网络;培训;逻辑门;

相似文献

外文文献
中文文献
专利

1. A novel randomized recurrent artificial neural network approach: recurrent random vector functional link network [J] . ?mer Faruk ERTU?RUL Turkish Journal of Electrical Engineering and Computer Sciences . 2019,第6期

机译：一种新型随机经常性人工神经网络方法：经常性随机矢量功能链路网络
2. Dense Recurrent Neural Networks for Accelerated MRI: History-Cognizant Unrolling of Optimization Algorithms [J] . Hosseini Seyed Amir Hossein, Yaman Burhaneddin, Moeller Steen, Selected Topics in Signal Processing, IEEE Journal of . 2020,第6期

机译：加速MRI密集的复发性神经网络：优化算法的历史认识展开
3. MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks [J] . Kim Byeongho, Chung Jongwook, Lee Eojin, IEEE Transactions on Computers . 2020,第7期

机译：MVID：移动DRAM中的稀疏矩阵矢量乘法，用于加速复发性神经网络
4. Memory-Efficient Backpropagation for Recurrent Neural Networks [C] . Issa Ayoub, Hussein Al Osman Canadian Conference on Artificial Intelligence . 2019

机译：递归神经网络的内存有效反向传播
5. A macromodeling approach for nonlinear microwave/RF circuits and devices based on recurrent neural networks. [D] . Fang, Yonghua. 2001

机译：基于递归神经网络的非线性微波/ RF电路和设备的宏建模方法。
6. A state space approach for piecewise-linear recurrent neural networks for identifying computational dynamics from neural measurements [O] . Daniel Durstewitz 2018

机译：分段线性递归神经网络的状态空间方法，用于从神经测量中识别计算动力学
7. vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design [O] . Rhu, Minsoo, Gimelshein, Natalia, Clemons, Jason, 2016

机译：vDNN：虚拟化深度神经网络，可扩展，节省内存神经网络设计

Accelerating Recurrent Neural Networks: A Memory-Efficient Approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅