首页> 外文会议>Annual conference of the International Speech Communication Association >Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
【24h】

Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition

机译:预训练深层神经网络在大词汇语音识别中的应用

获取原文

摘要

The use of Deep Belief Networks (DBN) to pretrain Neural Networks has recently led to a resurgence in the use of Artificial Neural Network - Hidden Markov Model (ANN/HMM) hybrid systems for Automatic Speech Recognition (ASR). In this paper we report results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously with DBN-pretrained ANN/HMM systems - 5870 hours of Voice Search and 1400 hours of YouTube data. On the first dataset, the pretrained ANN/HMM system outperforms the best Gaussian Mixture Model - Hidden Markov Model (GMM/HMM) baseline, built with a much larger dataset by 3.7% absolute WER, while on the second dataset, it outperforms the GMM/HMM baseline by 4.7% absolute. Maximum Mutual Information (MMI) fine tuning and model combination using Segmental Conditional Random Fields (SCARF) give additional gains of 0.1% and 0.4% on the first dataset and 0.5% and 0.9% absolute on the second dataset.
机译:深度信念网络(DBN)用于对神经网络进行预训练最近导致使用人工神经网络-隐马尔可夫模型(ANN / HMM)混合系统进行自动语音识别(ASR)的复兴。在本文中,我们报告了在两个数据集上经过训练的DBN预训练的上下文相关ANN / HMM系统的结果,该数据集比以前使用DBN预训练的ANN / HMM系统报告的任何数据集都大得多-5870小时的语音搜索和1400小时的YouTube数据。在第一个数据集上,经过预训练的ANN / HMM系统的性能优于最佳高斯混合模型-隐马尔可夫模型(GMM / HMM)基线,该基线由一个更大的数据集(绝对WER为3.7%)构成,而在第二个数据集上,它的性能优于GMM / HMM基准降低了4.7%(绝对值)。使用分段条件随机字段(SCARF)进行的最大互信息(MMI)精细调整和模型组合在第一个数据集上的附加收益分别为0.1%和0.4%,在第二个数据集上的附加收益分别为0.5%和0.9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号