首页> 外文学位 >On the robustness of static and dynamic spectral information for speech recognition in noise.
【24h】

On the robustness of static and dynamic spectral information for speech recognition in noise.

机译:静态和动态频谱信息在噪声中识别语音的鲁棒性。

获取原文
获取原文并翻译 | 示例

摘要

Automatic speech recognition (ASR) technology has achieved a high performance level in controlled laboratory environments, where background noise and channel variation are rather benign. However, for real-world applications, the performance of ASR systems may degrade greatly because of the mismatch between the training condition and the operating conditions.; In this thesis, we investigate the noise robustness of acoustic features in the cepstral domain, which have been successfully used in most of the state-of-the-art ASR systems. We attempt to discern to what extent we can make the recognition process insensitive to noise by exploiting the unequal robustness of different feature components. Our approach requires neither adaptation of the acoustic models nor front-end compensation.; Dynamic cepstral features supplement static features in characterizing their temporal trajectory. It has been widely known that the use of dynamic features improves the performance of speech recognition. However, few quantitative and systematic studies have been done to examine the robustness of static and dynamic features for ASR in noise. In this research, by investigating the noise robustness of the static and dynamic cepstral features in a quantitative way, we find that the dynamic features are more robust to noise than their static counterparts. Accordingly, we propose a simple but effective noise-robust speech recognition strategy by exponentially weighting the likelihoods of the static and dynamic features during the decoding process. A discriminative training procedure is developed to estimate the optimal feature weights automatically using a small amount of development data. This approach is evaluated on two connected-digit databases, one in English (Aurora 2) and the other in Cantonese (CUDigit). Significant performance improvements over the conventional un-weighted baseline recognition system are attained using condition-specific weights under a variety of noise conditions. The overall relative Word Error Rate (WER) reductions are 36.55% and 41.92% for Aurora 2 and CUDigit respectively. The proposed approach is appealing for practical applications because: (1) noise estimation is not required for feature compensation; (2) adaptation of HMMs to noisy environments is not required; (3) only a minor modification of the decoding process is needed; (4) only a few feature weights need to be trained. (Abstract shortened by UMI.)
机译:自动语音识别(ASR)技术已在受控实验室环境中达到了很高的性能水平,在实验室环境中背景噪声和通道变化相当不错。但是,对于实际应用,由于训练条件和操作条件之间的不匹配,ASR系统的性能可能会大大降低。在本文中,我们研究了倒谱域中声学特征的噪声鲁棒性,这些噪声特征已在大多数先进的ASR系统中成功使用。我们试图通过利用不同特征组件的不平等鲁棒性来辨别在多大程度上使识别过程对噪声不敏感。我们的方法既不需要修改声学模型,也不需要前端补偿。动态倒谱特征在表征其时间轨迹时补充了静态特征。众所周知,动态特征的使用改善了语音识别的性能。然而,很少有定量和系统的研究来研究噪声中ASR静态和动态特征的鲁棒性。在这项研究中,通过定量研究静态和动态倒谱特征的噪声鲁棒性,我们发现动态特征比静态特征对噪声更鲁棒。因此,我们通过在解码过程中按指数加权加权静态和动态特征的可能性,提出了一种简单但有效的鲁棒语音识别策略。开发了一种判别训练程序,以使用少量开发数据自动估计最佳特征权重。该方法在两个数字连接的数据库上进行了评估,一个数据库使用英语(Aurora 2),另一个数据库使用广东话(CUDigit)。通过在各种噪声条件下使用特定于条件的权重,可以实现比常规非加权基线识别系统明显的性能提升。 Aurora 2和CUDigit的整体相对单词错误率(WER)减少分别为36.55%和41.92%。所提出的方法在实际应用中具有吸引力,因为:(1)特征补偿不需要噪声估计; (2)不需要使HMM适应嘈杂的环境; (3)仅需要对解码过程进行较小的修改; (4)仅需要训练几个特征权重。 (摘要由UMI缩短。)

著录项

  • 作者

    Yang, Chen.;

  • 作者单位

    The Chinese University of Hong Kong (People's Republic of China).;

  • 授予单位 The Chinese University of Hong Kong (People's Republic of China).;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 141 p.
  • 总页数 141
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号