首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Predicting error rates for unknown data in automatic speech recognition
【24h】

Predicting error rates for unknown data in automatic speech recognition

机译:在自动语音识别中预测未知数据的错误率

获取原文

摘要

In this paper we investigate methods to predict word error rates in automatic speech recognition in the presence of unknown noise types, which have not been seen during training. The performance measures operate on phoneme posteriorgrams that are obtained from neural nets. We compare average frame-wise entropy as a baseline measure to the mean temporal distance (M-Measure) and to the number of phonetic events. The latter is obtained by learning typical phoneme activations from clean training data, which are later applied as phoneme-specific matched filters to posteriorgrams (MaP). When exceeding a threshold after filtering, we register this as phonetic event. For test sets using 10 unknown noise types and a wide range of signal-to-noise ratios, we find M-Measure and MaP to produce predictions twice as accurate as the baseline measure. When excluding noise types that contain speech segments, a prediction error of 3.1% is achieved, compared to 15.0% for the baseline measure.
机译:在本文中,我们研究了在未知噪声类型存在下预测自动语音识别中单词错误率的方法,这些方法在训练过程中还没有发现。性能度量对从神经网络获得的音素后序进行操作。我们将平均帧熵作为基线度量与平均时间距离(M-Measure)和语音事件的数量进行比较。后者是通过从干净的训练数据中学习典型的音素激活而获得的,随后将这些音素激活作为特定于音素的匹配滤波器应用于后验图(MaP)。过滤后超过阈值时,我们将其注册为语音事件。对于使用10种未知噪声类型和宽信噪比范围的测试集,我们发现M-Measure和MaP产生的预测准确度是基线测量的两倍。当排除包含语音段的噪声类型时,与基线测量的15.0%相比,可实现3.1%的预测误差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号