Context-dependent Deep Neural Networks for audio indexing of real-life data

机译：基于上下文依赖性深神经网络，用于真实数据的音频索引

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We apply Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, to the real-life problem of audio indexing of data across various sources. Recently, we had shown that on the Switchboard benchmark on speaker-independent transcription of phone calls, CD-DNN-HMMs with 7 hidden layers reduce the word error rate by as much as one-third, compared to discriminatively trained Gaussian-mixture HMMs, and by one-fourth if the GMM-HMM also uses fMPE features. This paper takes CD-DNN-HMM based recognition into a real-life deployment for audio indexing. We find that for our best speaker-independent CD-DNN-HMM, with 32k senones trained on 2000h of data, the one-fourth reduction does carry over to inhomogeneous field data (video podcasts and talks). Compared to a speaker-adaptive GMM system, the relative improvement is 18%, at very similar end-to-end runtime. In system building, we find that DNNs can benefit from a larger number of senones than the GMM-HMM; and that DNN likelihood evaluation is a sizeable runtime factor even in our wide-beam context of generating rich lattices: Cutting the model size by 60% reduces runtime by one-third at a 5% relative WER loss.

机译：我们将上下文的深神经网络HMMS或CD-DNN-HMM应用于各种来源的数据的音频索引的现实寿命问题。最近，我们已经表明，在交换扬声器独立转录的交换机基准上，与具有7个隐藏层的CD-DNN-HMMS，与鉴别训练训练的高斯 - 混合HMM相比，具有7个隐形的单词错误率多达三分之一。并且如果GMM-HMM也使用FMPE功能，请四分之一。本文将CD-DNN-HMM基于CD-DNN-HMM识别为音频索引的实际部署。我们发现，对于我们的最佳扬声器无关的CD-DNN-HMM，有32K的Senones在2000年培训的数据中培训，第四个减少确实携带到不均匀的现场数据（视频播客和谈判）。与扬声器适应的GMM系统相比，相对改善是18％，在非常相似的端到端运行时。在系统建设中，我们发现DNN可以从比GMM-HMM中受益于更多数量的Senones;并且，DNN似然评估是一个相当大的运行时因子，即使在生成富格的宽波束上下文中，也可以将模型大小切割60％，以5％的相对WER损耗减少三分之一的运行时间。

著录项

来源
《IEEE Workshop on Spoken Language Technology》|2012年||共6页
会议地点
作者
Li Gang; Zhu Huifeng; Cheng Gong; Thambiratnam Kit; Chitsaz Behrooz; Yu Dong; Seide Frank;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
audio indexing; deep learning; deep neural networks; speech recognition;

机译：音频索引;深度学习;深神经网络;语音识别;

相似文献

外文文献
中文文献
专利

1. Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition [J] . WANG G., Sim K.C. Audio, Speech, and Language Processing, IEEE Transactions on . 2014,第11期

机译：用于语音识别的基于回归的上下文依赖的深度神经网络建模
2. Image Retrieval Using Deep Convolutional Neural Networks and Regularized Locality Preserving Indexing Strategy [J] . Xiaoxiao Ma, Jiajun Wang Journal of Computer and Communications . 2017,第3期

机译：使用深度卷积神经网络和规则的局部性保留索引策略进行图像检索
3. Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models [J] . Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第2期

机译：深度神经网络声学模型中上下文相关目标的多任务学习
4. Context-dependent Deep Neural Networks for audio indexing of real-life data [C] . Li Gang, Zhu Huifeng, Cheng Gong, 2012 IEEE Workshop on Spoken Language Technology. . 2012

机译：上下文相关的深度神经网络，用于对真实数据进行音频索引
5. Statistical Machine Learning & Deep Neural Networks Applied to Neural Data Analysis [D] . Shokri Razaghi, Hooshmand. 2020

机译：统计机器学习和深神经网络应用于神经数据分析
6. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network [O] . Seung Seog Han, Gyeong Hun Park, Woohyung Lim, -1

机译：深度神经网络在灰指甲诊断中显示出与皮肤科医生相当且通常优于皮肤病的性能：通过基于区域的卷积深度神经网络自动构建灰指甲数据集
7. CONTEXT-DEPENDENT DEEP NEURAL NETWORKS FOR AUDIO INDEXING OF REAL-LIFE DATA [O] . Gang Li, Huifeng Zhu, Gong Cheng, 2013

机译：上下文依赖的深度神经网络，用于对生活数据进行音频索引

Context-dependent Deep Neural Networks for audio indexing of real-life data

摘要

著录项

相似文献

相关主题

期刊订阅