Predicting error rates for unknown data in automatic speech recognition

机译：在自动语音识别中预测未知数据的错误率

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we investigate methods to predict word error rates in automatic speech recognition in the presence of unknown noise types, which have not been seen during training. The performance measures operate on phoneme posteriorgrams that are obtained from neural nets. We compare average frame-wise entropy as a baseline measure to the mean temporal distance (M-Measure) and to the number of phonetic events. The latter is obtained by learning typical phoneme activations from clean training data, which are later applied as phoneme-specific matched filters to posteriorgrams (MaP). When exceeding a threshold after filtering, we register this as phonetic event. For test sets using 10 unknown noise types and a wide range of signal-to-noise ratios, we find M-Measure and MaP to produce predictions twice as accurate as the baseline measure. When excluding noise types that contain speech segments, a prediction error of 3.1% is achieved, compared to 15.0% for the baseline measure.

机译：在本文中，我们研究了在未知噪声类型存在下预测自动语音识别中单词错误率的方法，这些方法在训练过程中还没有发现。性能度量对从神经网络获得的音素后序进行操作。我们将平均帧熵作为基线度量与平均时间距离（M-Measure）和语音事件的数量进行比较。后者是通过从干净的训练数据中学习典型的音素激活而获得的，随后将这些音素激活作为特定于音素的匹配滤波器应用于后验图（MaP）。过滤后超过阈值时，我们将其注册为语音事件。对于使用10种未知噪声类型和宽信噪比范围的测试集，我们发现M-Measure和MaP产生的预测准确度是基线测量的两倍。当排除包含语音段的噪声类型时，与基线测量的15.0％相比，可实现3.1％的预测误差。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2017年|5330-5334|共5页
会议地点
作者
Bernd T. Meyer; Sri Harish Mallidi; Hendrik Kayser; Hynek Hermansky;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Entropy; Noise measurement; Measurement uncertainty; Error analysis; Training data; Signal to noise ratio; Automatic speech recognition;

机译：熵;噪声测量;测量不确定度;误差分析;训练数据;信噪比;自动语音识别;

相似文献

外文文献
中文文献
专利

1. Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process [J] . Cesar Montenegro, Roberto Santana, Jose A. Lozano Engineering Applications of Artificial Intelligence . 2021,第Apra期

机译：转向末端检测任务对自动语音识别过程产生的错误的敏感性分析
2. Re: "frequency and spectrum of errors in final radiology reports generated with automatic speech recognition technology". [J] . Janower ML Journal of the American College of Radiology: JACR . 2009,第7期

机译：回复：“使用自动语音识别技术生成的最终放射学报告中的错误频率和频谱”。
3. Re: "frequency and spectrum of errors in final radiology reports generated with automatic speech recognition technology". [J] . Branstetter BF 4th, Shrestha RB Journal of the American College of Radiology: JACR . 2009,第7期

机译：回复：“使用自动语音识别技术生成的最终放射学报告中的错误频率和频谱”。
4. Predicting error rates for unknown data in automatic speech recognition [C] . Bernd T. Meyer, Sri Harish Mallidi, Hendrik Kayser, IEEE International Conference on Acoustics, Speech and Signal Processing . 2017

机译：预测自动语音识别中未知数据的错误率
5. E_ective Use of Cross-Domain Parsing in Automatic Speech Recognition and Error Detection. [D] . Marin, Marius Alexandru. 2015

机译：跨域解析在自动语音识别和错误检测中的有效使用。
6. Predicting Speech Perception in Older Listeners with Sensorineural Hearing Loss Using Automatic Speech Recognition [O] . Lionel Fontan, Tom Cretin-Maitenaz, Christian Füllgrabe 2020

机译：使用自动语音识别预测具有传感器听力损失的较旧听众的言语感知
7. Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition [O] . Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr 2021

机译：分析扬声器本地化误差对自动语音识别语音分离的影响
8. RSRE (Royal Signals and Radar Establishment) Speech Database Recordings (1983). Part 2. Recording Made for Automatics Speech Recognition Assessment and Research [R] . Russell, M. J., Moore, R. K., Tomlinson, M. J., 1984

机译：RsRE（皇家信号和雷达建立）语音数据库记录（1983年）。第2部分。用于自动语音识别评估和研究的录音

Predicting error rates for unknown data in automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅