首页> 外文会议>International Conference on speech and computer >A Comparison of Language Model Training Techniques in a Continuous Speech Recognition System for Serbian
【24h】

A Comparison of Language Model Training Techniques in a Continuous Speech Recognition System for Serbian

机译:塞尔维亚语连续语音识别系统中语言模型训练技术的比较

获取原文

摘要

In this paper, a number of language model training techniques will be examined and utilized in a large vocabulary continuous speech recognition system for the Serbian language (more than 120000 words), namely Mikolov and Yandex RNNLM, TensorFlow based GPU approaches and CUED-RNNLM approach. The baseline acoustic model is a chain sub-sampled time delayed neural network, trained using cross-entropy training and a sequence-level objective function on a database of about 200 h of speech. The baseline language model is a 3-gram model trained on the training part of the database transcriptions and the Serbian journalistic corpus (about 600000 utterances), using the SRILM toolkit and the Kneser-Ney smoothing method, with a pruning value of 10~(-7) (previous best). The results are analyzed in terms of word and character error rates and the perplexity of a given language model on training and validation sets. Relative improvement of 22.4% (best word error rate of 7.25%) is obtained in comparison to the baseline language model.
机译:在本文中,将对大型语言连续语音识别系统(超过120000个单词)中的塞尔维亚语(Mikolov和Yandex RNNLM),基于TensorFlow的GPU方法和CUED-RNNLM方法的大型语言连续语音识别系统进行研究和使用多种语言模型训练技术。 。基线声学模型是一个链子采样的时延神经网络,使用交叉熵训练和序列级目标函数在约200 h语音的数据库上进行训练。基准语言模型是一个3克模型,使用SRILM工具包和Kneser-Ney平滑方法在数据库转录和塞尔维亚新闻语料库(约60万个发音)的训练部分上训练,修剪值为10〜( -7)(以前的最佳)。根据单词和字符的错误率以及给定语言模型在训练和验证集上的困惑度来分析结果。与基准语言模型相比,相对改善了22.4%(最佳单词错误率达7.25%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号