Space-Time Residual LSTM Architechture for Distant Speech Recognition

机译：时空残留LSTM架构，用于远程语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Long Short-Term Memory (Plain-LSTM) is efficient for acoustic modeling in automatic speech recognition systems, but their training is obstructed by the vanishing and exploding gradient issues. To alleviate the problem, the paper introduces an improved space residual LSTM (S-RES-LSTM), which uses the output before not after the LSTM projection layer as spatial shortcut connection compared to the previous RES-LSTM. Experiments for distant speech recognition on the AMI SDM show that S-RES-LSTM can reach 5% absolute WER(over) and 5.9% absolute WER (non-over) reduction than the Plain-LSTM in 9- layer in eval. It also has 0.6% absolute WER reduction than the RES-LSTM in 9-layer. To further enhance the information flow for S-RES-LSTM, the space and time residual LSTM (ST-RES-LSTM) is proposed, which adds an innovational residual connection in the temporal dimension. The experiments show that compared with the Plain-LSTM and the RES-LSTM, ST-RES-LSTM achieves 5.5% absolute WER(over) degradation and 1% absolute WER(over) reduction respectively in 9-layer in eval.

机译：长短时记忆（Plain-LSTM）对于自动语音识别系统中的声学建模非常有效，但是它们的训练由于梯度问题的消失和爆炸而受阻。为了缓解该问题，本文引入了一种改进的空间残差LSTM（S-RES-LSTM），与之前的RES-LSTM相比，该方法使用LSTM投影层之前而不是之后的输出作为空间快捷连接。在AMI SDM上进行远距离语音识别的实验表明，相比9层的Plain-LSTM，S-RES-LSTM可以达到5％的绝对WER（超过）和5.9％的绝对WER（不超过）降低。与9层RES-LSTM相比，它的WER绝对降低了0.6％。为了进一步增强S-RES-LSTM的信息流，提出了时空残差LSTM（ST-RES-LSTM），它在时间维度上增加了创新的残差连接。实验表明，与Plain-LSTM和RES-LSTM相比，ST-RES-LSTM在9层评估中分别实现5.5％的绝对WER（过量）降解和1％的绝对WER（过量）降低。

著录项

来源
《International Symposium on Chinese Spoken Language Processing》|2018年|379-383|共5页
会议地点
作者
Long Wu; Li Wang; Pengyuan Zhang; Ta Li; Yonghong Yan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech recognition; Training; Mathematical model; Hidden Markov models; Microphones; Acoustics; Neural networks;

机译：语音识别;训练;数学模型;隐马尔可夫模型;麦克风;声学;神经网络;

相似文献

外文文献
中文文献
专利

1. Pushing the boundaries of audiovisual word recognition using Residual Networks and LSTMs [J] . Themos Stafylakis, Muhammad Haris Khan, Georgios Tzimiropoulos Computer vision and image understanding . 2018,第NovaDE期

机译：使用残差网络和LSTM突破视听单词识别的界限
2. Automatic proficiency assessment of Korean speech read aloud by non‐natives using bidirectional LSTM‐based speech recognition [J] . Yoo Rhee Oh, Kiyoung Park, Hyung‐Bae Jeon, ETRI journal . 2020,第5期

机译：使用基于双向LSTM的语音识别，非洲主义韩国语音的自动能力评估大声朗读
3. Single-Channel Speech Enhancement Techniques for Distant Speech Recognition [J] . Jaya Kumar Ashwini, Ramaswamy Kumaraswamy Journal of Intelligent Systems . 2013,第2期

机译：用于远距离语音识别的单通道语音增强技术
4. Space-Time Residual LSTM Architechture for Distant Speech Recognition [C] . Long Wu, Li Wang, Pengyuan Zhang, International Symposium on Chinese Spoken Language Processing . 2018

机译：用于遥远语音识别的时空残余LSTM架构
5. Robust Acoustic Modeling and Front-End Design for Distant Speech Recognition [D] . Mirsamadi, Seyedmahdad. 2017

机译：鲁棒的声学建模和远端语音识别前端设计
6. Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM NeuralNetwork [O] . Myungjong Kim, Beiming Cao, Ted Mau, -1

机译：使用LSTM神经从肉点发音运动中独立于说话者的沉默语音识别网络
7. Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition [O] . Kim, Jaeyoung, El-Khamy, Mostafa, Lee, Jungwon 2017

机译：剩余LsTm：远程网络的深度递归架构设计语音识别
8. LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition. [R] . Irie, K., Tuske, Z., Alkhouli, T., 2016

机译：LsTm，GRU，公路和一点注意：语音识别中语言建模的经验概述。

Space-Time Residual LSTM Architechture for Distant Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅