首页> 外文会议>International Conference on speech and computer >Building Real-Time Speech Recognition Without CMVN
【24h】

Building Real-Time Speech Recognition Without CMVN

机译:在不使用CMVN的情况下构建实时语音识别

获取原文

摘要

Estimating cepstral mean and variance normalization (CMVN) in run-on and real-time settings poses several challenges. Using a moving average for variance and mean estimation requires a comparatively long history of data from a speaker which is not appropriate for short utterances or conversations. Using a pre-estimated global CMVN for speakers instead reduces the recognition performance due to potential mismatch between training and testing data. This paper investigates how to build a real-time run-on speech recognition system using acoustic features without applying CMVN. We propose a feature extraction architecture which can transform unnormalized log mel features to normalized bottleneck features without using historical data. We empirically show that mean and variance normalization is not critical for training neural networks on speech data. Using the proposed feature extraction, we achieved 4.1% word error rate reduction compared to global CMVN on the Skype conversations test set. We also reveal many cases when features without zero-mean can be learnt well by neural networks which stands in contrast to prior work.
机译:在运行和实时设置中估计倒谱均值和方差归一化(CMVN)带来了一些挑战。将移动平均值用于方差和均值估计需要来自说话者的较长的数据历史记录,这不适用于简短的讲话或对话。相反,由于培训和测试数据之间可能存在不匹配,因此对扬声器使用预先估计的全局CMVN会降低识别性能。本文研究了如何在不应用CMVN的情况下使用声学特征构建实时运行的语音识别系统。我们提出了一种特征提取体系结构,可以将未归一化的log mel特征转换为归一化的瓶颈特征,而无需使用历史数据。我们根据经验表明,均值和方差归一化对于在语音数据上训练神经网络并不关键。使用提议的特征提取,与Skype对话测试集上的全局CMVN相比,我们实现了4.1%的字错误率降低。我们还揭示了许多情况,其中神经网络可以很好地学习零均值特征,这与先前的工作形成了鲜明的对比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号