Building DNN acoustic models for large vocabulary speech recognition

Andrew L. Maas; Peng Qi; Ziang Xie; Awni Y. Hannun; Christopher T. Lengerich; Daniel Jurafsky; Andrew Y. Ng

首页> 外文期刊>Computer speech and language >Building DNN acoustic models for large vocabulary speech recognition

【24h】

Building DNN acoustic models for large vocabulary speech recognition

机译：建立用于大词汇量语音识别的DNN声学模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Understanding architectural choices for deep neural networks (DNNs) is crucial to improving state-of-the-art speech recognition systems. We investigate which aspects of DNN acoustic model design are most important for speech recognition system performance, focusing on feed-forward networks. We study the effects of parameters like model size (number of layers, total parameters), architecture (convolutional networks), and training details (loss function, regularization methods) on DNN classifier performance and speech recognizer word error rates. On the Switchboard benchmark corpus we compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. Using a much larger 2100-hour training corpus (combining Switchboard and Fisher) we examine the performance of very large DNN models - with up to ten times more parameters than those typically used in speech recognition systems. The results suggest that a relatively simple DNN architecture and optimization technique give strong performance, and we offer intuitions about architectural choices like network depth over breadth. Our findings extend previous works to help establish a set of best practices for building DNN hybrid speech recognition systems and constitute an important first step toward analyzing more complex recurrent, sequence-discriminative, and HMM-free architectures.

机译：了解深度神经网络（DNN）的体系结构选择对于改进最新的语音识别系统至关重要。我们重点研究前馈网络，研究DNN声学模型设计的哪些方面对于语音识别系统的性能最重要。我们研究了模型大小（层数，总参数），体系结构（卷积网络）和训练细节（损失函数，正则化方法）等参数对DNN分类器性能和语音识别器单词错误率的影响。在Switchboard基准语料库上，我们将标准DNN与卷积网络进行了比较，并提出了使用本地连接的，非捆绑式神经网络进行声学建模的第一个实验。我们使用更大的2100小时训练语料库（结合了Switchboard和Fisher），检查了非常大的DNN模型的性能-参数比语音识别系统中通常使用的参数多十倍。结果表明，相对简单的DNN架构和优化技术可提供强大的性能，并且我们提供了有关架构选择（如网络深度超过广度）的直觉。我们的发现扩展了以前的工作，以帮助建立一套用于构建DNN混合语音识别系统的最佳实践，并构成了分析更复杂的循环，区分序列和无HMM架构的重要第一步。

著录项

来源
《Computer speech and language》 |2017年第1期|195-213|共19页
作者
Andrew L. Maas; Peng Qi; Ziang Xie; Awni Y. Hannun; Christopher T. Lengerich; Daniel Jurafsky; Andrew Y. Ng;
展开▼
作者单位

Stanford University, Stanford, CA 94305, USA;

Stanford University, Stanford, CA 94305, USA;

Stanford University, Stanford, CA 94305, USA;

Stanford University, Stanford, CA 94305, USA;

Stanford University, Stanford, CA 94305, USA;

Stanford University, Stanford, CA 94305, USA;

Stanford University, Stanford, CA 94305, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Hidden Markov model deep neural network (HMM-DNN); Neural networks; Acoustic modeling; Speech recognition; Large vocabulary continuous speech recognition (LVCSR);

机译：隐马尔可夫模型深度神经网络（HMM-DNN）;神经网络;声学建模;语音识别;大词汇量连续语音识别（LVCSR）;

相似文献

外文文献
中文文献
专利

1. A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition [J] . Li Xiangang, Yang Yuning, Pang Zaihu, Neurocomputing . 2015,第deca25期

机译：基于大词汇量中文语音识别的深度神经网络中声学建模单元选择的比较研究
2. Boosting HMM acoustic models in large vocabulary speech recognition [J] . Meyer C, Schramm H Speech Communication . 2006,第5期

机译：在大词汇量语音识别中增强HMM声学模型
3. Unsupervised training of acoustic models for large vocabulary continuous speech recognition [J] . Wessel F., Ney H. IEEE Transactions on Speech and Audio Proceessing . 2005,第1期

机译：用于大词汇量连续语音识别的声学模型的无监督训练
4. Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling [C] . Pan Jia, Liu Cong, Wang Zhiguo, 2012 8th International Symposium on Chinese Spoken Language Processing. . 2012

机译：用于大词汇量连续语音识别的深层神经网络（DNN）研究：为什么DNN在声学建模中超过GMMS
5. Statistical optimization of acoustic models for large vocabulary speech recognition [D] . Hu, Rusheng 2006

机译：用于大词汇量语音识别的声学模型的统计优化
6. Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition [O] . Edvin Pakoci, Branislav Popović, Darko Pekar 2019

机译：在塞尔维亚大型词汇语音识别的语言建模中使用形态学数据
7. Building DNN Acoustic Models for Large Vocabulary Speech Recognition [O] . Maas, Andrew L., Qi, Peng, Xie, Ziang, 2015

机译：构建用于大词汇量语音识别的DNN声学模型

Building DNN acoustic models for large vocabulary speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅