首页> 外文期刊>IEEE transactions on audio, speech and language processing >Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice
【24h】

Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice

机译:基于客观质量度量和人工语音的噪声条件下语音识别系统性能评估

获取原文
获取原文并翻译 | 示例

摘要

It is essential to ensure quality of service (QoS) when offering a speech recognition service for use in noisy environments. This means that the recognition performance in the target noise environment must be investigated. One approach is to estimate the recognition performance from a distortion value, which represents the difference between noisy speech and its original clean version. Previously, estimation methods using the segmental signal-to-noise ratio (SNRseg), the cepstral distance (CD), and the perceptual evaluation of speech quality (PESQ) have been proposed. However, their estimation accuracy has not been verified for the case when a noise reduction algorithm is adopted as a preprocessing stage in speech recognition. We, therefore, evaluated the effectiveness of these distortion measures by experiments using the AURORA-2J connected digit recognition task and four different noise reduction algorithms. The results showed that in each case the distortion measure correlates well with the word accuracy when the estimators used are optimized for each individual noise reduction algorithm. In addition, it was confirmed that when a single estimator, optimized for all the noise reduction algorithms, is used, the PESQ method gives a more accurate estimate than SNRseg and CD. Furthermore, we have proposed the use of artificial voice of several seconds duration instead of a large amount of real speech and confirmed that a relatively accurate estimate can be obtained by using the artificial voice.
机译:提供用于嘈杂环境的语音识别服务时,确保服务质量(QoS)至关重要。这意味着必须研究目标噪声环境下的识别性能。一种方法是根据失真值来估计识别性能,失真值代表了嘈杂语音与其原始清晰版本之间的差异。以前,已经提出了使用分段信噪比(SNRseg),倒谱距离(CD)和语音质量的感知评估(PESQ)的估计方法。但是,对于在语音识别中采用降噪算法作为预处理阶段的情况,其估计精度尚未得到验证。因此,我们通过使用AURORA-2J关联数字识别任务和四种不同的降噪算法进行实验,评估了这些失真测量的有效性。结果表明,在每种情况下,针对每种单独的降噪算法对使用的估计器进行优化时,失真度量与词的准确性均具有良好的相关性。此外,已经确认,当使用针对所有降噪算法优化的单个估计器时,PESQ方法比SNRseg和CD给出的估计值更准确。此外,我们已经提出使用持续时间为几秒的人工语音代替大量真实语音,并确认可以通过使用人工语音获得相对准确的估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号