An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

Kadetotad Deepak; Yin Shihui; Berisha Visar; Chakrabarti Chaitali; Seo Jae-sun

首页> 外文期刊>IEEE Journal of Solid-State Circuits >An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

【24h】

An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

机译：8.93个顶部/ W LSTM经常性神经网络加速器，具有用于设备的分层粗粒稀疏性，用于设备上的语音识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16x fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.

机译：长短期内存（LSTM）是一种经常性的神经网络（RNN），其广泛用于时间序列数据和语音应用，因为它对这些任务的高精度。然而，LSTMS对于有效的硬件实现困难，因为它们需要大量的重量存储并表现出计算复杂性。先前作品已经提出了压缩技术，以减轻LSTMS的存储/计算要求，但是元素稀疏性方案产生的可征收索引存储器开销和结构化压缩技术报告有限的压缩比。在本文中，我们提出了一个节能LSTM RNN加速器，其特征在于一种算法 - 硬件共同优化内存压缩技术，称为分层粗粒稀疏（HCG）。通过基于HCG的块块递归重量压缩，我们展示了最多16倍的权重的LSTM网络，同时实现最小的错误率劣化。使用基于HCG的压缩LSTMS，65-NM LP CMOS中制造的原型芯片可实现高达8.93个顶部/倍的实时语音识别。基于HCGS的LSTMS已经证明了节能语音识别，速度，TED Lium和LibrisPeech数据集的低误差率。

著录项

来源
《IEEE Journal of Solid-State Circuits 》 |2020年第7期| 1877-1887| 共11页
作者
Kadetotad Deepak; Yin Shihui; Berisha Visar; Chakrabarti Chaitali; Seo Jae-sun;
展开▼
作者单位

Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85281 USA|Starkey Hearing Technol Eden Prairie MN 55344 USA;

Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85281 USA;

Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85281 USA;

Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85281 USA;

Arizona State Univ Sch Elect Comp & Energy Engn Tempe AZ 85281 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech recognition; Logic gates; Feature extraction; Microsoft Windows; Task analysis; Hardware; Error analysis; Hardware accelerator; long short-term memory (LSTM); speech recognition; structured sparsity; weight compression;

机译：语音识别;逻辑门;特征提取;Microsoft Windows;任务分析;硬件;错误分析;硬件加速器;长短短期记忆（LSTM）;语音识别;体积压缩;体重压缩;

相似文献

外文文献
中文文献
专利

1. A 8.93-TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity With All Parameters Stored On-Chip [J] . Deepak Kadetotad, Visar Berisha, Chaitali Chakrabarti, . 2019 ,第9期

机译：8.93 - 顶/ W LSTM经常性神经网络加速器，具有分层粗粒稀疏性，具有存储片上的所有参数
2. Hierarchical Singleton-Type Recurrent Neural Fuzzy Networks for Noisy Speech Recognition [J] . Juang C.-F., Chiou C.-T., Lai C.-L. IEEE Transactions on Neural Networks . 2007 ,第3期

机译：分层单例类型递归神经模糊网络用于嘈杂语音识别
3. SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS [J] . Chen WY., Chen SH., Liao YF. Pattern Recognition: The Journal of the Pattern Recognition Society . 1995 ,第6期

机译：递归神经网络的语音识别
4. Touch Wear: Context-Dependent and Self-Learning Personal Speech Assistant for Wearable Systems with Deep Neural Networks: Using Contextual LSTMs on Recurrent Neural Networks [C] . Joshua Ho, Chien-Min Wang International Conference on Smart Portable, Wearable, Implantable and Disability-oriented Devices and Systems . 2018

机译：触摸磨损：具有深度神经网络的可穿戴系统的上下文依赖和自学个人语音助手：在经常性神经网络上使用上下文LSTMS
5. Design of a Scalable, Configurable, and Cluster-based Hierarchical Hardware Accelerator for a Cortically Inspired Algorithm and Recurrent Neural Networks [D] . Dey, Sumon. 2019

机译：设计可扩展，可配置和基于群集的分层硬件加速器，用于显影灵感算法和经常性神经网络
6. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition [O] . Francisco Javier Ordóñez, Daniel Roggen 2016

机译：深度卷积和LSTM递归神经网络用于多模式可穿戴活动识别
7. CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network [O] . Soonil Kwon 2020

机译：CLSTM：基于深度特征的语音情感识别，使用分层Convlstm网络

An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅