首页> 外文会议>International Conference on speech and computer >Language Model Optimization for a Deep Neural Network Based Speech Recognition System for Serbian
【24h】

Language Model Optimization for a Deep Neural Network Based Speech Recognition System for Serbian

机译:基于深度神经网络的塞尔维亚语音识别系统的语言模型优化

获取原文

摘要

This paper presents the results obtained using several variants of trigram language models in a large vocabulary continuous speech recognition (LVCSR) system for the Serbian language, based on the deep neural network (DNN) framework implemented within the Kaldi speech recognition toolkit. This training approach allows parallelization using several threads on either multiple GPUs or multiple CPUs, and provides a natural-gradient modification to the stochastic gradient descent (SGD) optimization method. Acoustic models are trained over a fixed number of training epochs with parameter averaging in the end. This paper discusses recognition using different language models trained with Kneser-Ney or Good-Turing smoothing methods, as well as several pruning parameter values. The results on a test set containing more than 120000 words and different utterance types are explored and compared to the referent results with GMM-HMM speaker-adapted models for the same speech database. Online and offline recognition results are compared to each other as well. Finally, the effect of additional discriminative training using a language model prior to the DNN stage is explored.
机译:本文介绍了在Kaldi语音识别工具包中实现的深度神经网络(DNN)框架的基础上,在塞尔维亚语的大词汇量连续语音识别(LVCSR)系统中使用Trigram语言模型的几种变体所获得的结果。这种训练方法允许使用多个GPU或多个CPU上的多个线程进行并行化,并为随机梯度下降(SGD)优化方法提供自然梯度的修改。在固定数量的训练时期内对声学模型进行训练,最后对参数进行平均。本文讨论使用通过Kneser-Ney或Good-Turing平滑方法训练的不同语言模型以及几种修剪参数值进行的识别。探索了包含超过120000个单词和不同发声类型的测试集的结果,并将其与针对相同语音数据库的GMM-HMM说话者自适应模型的参考结果进行了比较。在线和离线识别结果也将彼此进行比较。最后,探索了在DNN阶段之前使用语言模型进行的额外判别训练的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号