Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

机译：聆听，参加和拼写：用于大词汇量会话语音识别的神经网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional speech recognizers. In LAS, the neural network architecture subsumes the acoustic, pronunciation and language models making it not only an end-to-end trained system but an end-to-end model. In contrast to DNN-HMM, CTC and most other models, LAS makes no independence assumptions about the probability distribution of the output character sequences given the acoustic sequence. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits each character conditioned on all previous characters, and the entire acoustic sequence. On a Google voice search task, LAS achieves a WER of 14.1% without a dictionary or an external language model and 10.3% with language model rescoring over the top 32 beams. In comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0% on the same set.

机译：我们介绍了听力，出席和拼写（LAS），这是一种神经语音识别器，可以直接将语音转录成字符，而无需发音模型，HMM或传统语音识别器的其他组件。在LAS中，神经网络体系结构包含了声学，发音和语言模型，从而使其不仅是端对端训练的系统，而且是端对端模型。与DNN-HMM，CTC和大多数其他模型相比，LAS在给定声学序列的情况下不对输出字符序列的概率分布进行独立假设。我们的系统包含两个组件：侦听器和拼写器。侦听器是一个金字塔式递归网络编码器，它接受滤波器组频谱作为输入。拼写器是一种基于注意力的循环网络解码器，它发出以所有先前字符为条件的每个字符以及整个声音序列。在Google语音搜索任务中，如果不使用字典或外部语言模型，LAS的WER为14.1％，而对前32个波束的语言模型进行评分时，LAS的WER为10.3％。相比之下，最新的CLDNN-HMM模型在同一设备上的WER为8.0％。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2016年|4960-4964|共5页
会议地点
作者
William Chan; Navdeep Jaitly; Quoc Le; Oriol Vinyals;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Recurrent neural network; end-to-end speech recognition; neural attention;

机译：递归神经网络;端到端语音识别;神经注意;

相似文献

外文文献
中文文献
专利

1. Modeling word-level rate-of-speech variation in large vocabulary conversational speech recognition [J] . Jing Zheng, Horacio Franco, Andreas Stolcke Speech Communication . 2003,第2a3期

机译：大型词汇会话语音识别中的词级语音变化率建模
2. Recent advances in conversational speech recognition using convolutional and recurrent neural networks [J] . G. Saon, M. Picheny IBM Journal of Research and Development . 2017,第4期

机译：使用卷积和递归神经网络进行对话语音识别的最新进展
3. Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition [J] . Jibin Wu, Emre Y?lmaz, Malu Zhang, Frontiers in Neuroscience . 2020,第4期

机译：大型词汇自动语音识别深尖峰神经网络
4. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition [C] . William Chan, Navdeep Jaitly, Quoc Le, IEEE International Conference on Acoustics, Speech and Signal Processing . 2016

机译：倾听，参加和法术：大词汇会话语音识别的神经网络
5. Dysarthric Speech Recognition and Offline Handwriting Recognition using Deep Neural Networks. [D] . Pillai, Suhas Balkrishna. 2017

机译：使用深度神经网络的表情异常语音识别和离线手写识别。
6. Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition [O] . Jibin Wu, Emre Yılmaz, Malu Zhang, 2020

机译：大型词汇自动语音识别深尖峰神经网络
7. Hybrid language models for out of vocabulary word detection in large vocabulary conversational speech recognition [O] . Ali Yazgan, Murat Saraclar 2004

机译：用于大词汇量会话语音识别中词汇外单词检测的混合语言模型

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅