首页> 外文会议>IEEE Workshop on Spoken Language Technology >Context-dependent Deep Neural Networks for audio indexing of real-life data
【24h】

Context-dependent Deep Neural Networks for audio indexing of real-life data

机译:基于上下文依赖性深神经网络,用于真实数据的音频索引

获取原文

摘要

We apply Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, to the real-life problem of audio indexing of data across various sources. Recently, we had shown that on the Switchboard benchmark on speaker-independent transcription of phone calls, CD-DNN-HMMs with 7 hidden layers reduce the word error rate by as much as one-third, compared to discriminatively trained Gaussian-mixture HMMs, and by one-fourth if the GMM-HMM also uses fMPE features. This paper takes CD-DNN-HMM based recognition into a real-life deployment for audio indexing. We find that for our best speaker-independent CD-DNN-HMM, with 32k senones trained on 2000h of data, the one-fourth reduction does carry over to inhomogeneous field data (video podcasts and talks). Compared to a speaker-adaptive GMM system, the relative improvement is 18%, at very similar end-to-end runtime. In system building, we find that DNNs can benefit from a larger number of senones than the GMM-HMM; and that DNN likelihood evaluation is a sizeable runtime factor even in our wide-beam context of generating rich lattices: Cutting the model size by 60% reduces runtime by one-third at a 5% relative WER loss.
机译:我们将上下文的深神经网络HMMS或CD-DNN-HMM应用于各种来源的数据的音频索引的现实寿命问题。最近,我们已经表明,在交换扬声器独立转录的交换机基准上,与具有7个隐藏层的CD-DNN-HMMS,与鉴别训练训练的高斯 - 混合HMM相比,具有7个隐形的单词错误率多达三分之一。并且如果GMM-HMM也使用FMPE功能,请四分之一。本文将CD-DNN-HMM基于CD-DNN-HMM识别为音频索引的实际部署。我们发现,对于我们的最佳扬声器无关的CD-DNN-HMM,有32K的Senones在2000年培训的数据中培训,第四个减少确实携带到不均匀的现场数据(视频播客和谈判)。与扬声器适应的GMM系统相比,相对改善是18%,在非常相似的端到端运行时。在系统建设中,我们发现DNN可以从比GMM-HMM中受益于更多数量的Senones;并且,DNN似然评估是一个相当大的运行时因子,即使在生成富格的宽波束上下文中,也可以将模型大小切割60%,以5%的相对WER损耗减少三分之一的运行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号