首页> 外文学位 >On the robustness of static and dynamic spectral information for speech recognition in noise.

【24h】

On the robustness of static and dynamic spectral information for speech recognition in noise.

机译：静态和动态频谱信息在噪声中识别语音的鲁棒性。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic speech recognition (ASR) technology has achieved a high performance level in controlled laboratory environments, where background noise and channel variation are rather benign. However, for real-world applications, the performance of ASR systems may degrade greatly because of the mismatch between the training condition and the operating conditions.; In this thesis, we investigate the noise robustness of acoustic features in the cepstral domain, which have been successfully used in most of the state-of-the-art ASR systems. We attempt to discern to what extent we can make the recognition process insensitive to noise by exploiting the unequal robustness of different feature components. Our approach requires neither adaptation of the acoustic models nor front-end compensation.; Dynamic cepstral features supplement static features in characterizing their temporal trajectory. It has been widely known that the use of dynamic features improves the performance of speech recognition. However, few quantitative and systematic studies have been done to examine the robustness of static and dynamic features for ASR in noise. In this research, by investigating the noise robustness of the static and dynamic cepstral features in a quantitative way, we find that the dynamic features are more robust to noise than their static counterparts. Accordingly, we propose a simple but effective noise-robust speech recognition strategy by exponentially weighting the likelihoods of the static and dynamic features during the decoding process. A discriminative training procedure is developed to estimate the optimal feature weights automatically using a small amount of development data. This approach is evaluated on two connected-digit databases, one in English (Aurora 2) and the other in Cantonese (CUDigit). Significant performance improvements over the conventional un-weighted baseline recognition system are attained using condition-specific weights under a variety of noise conditions. The overall relative Word Error Rate (WER) reductions are 36.55% and 41.92% for Aurora 2 and CUDigit respectively. The proposed approach is appealing for practical applications because: (1) noise estimation is not required for feature compensation; (2) adaptation of HMMs to noisy environments is not required; (3) only a minor modification of the decoding process is needed; (4) only a few feature weights need to be trained. (Abstract shortened by UMI.)

机译：自动语音识别（ASR）技术已在受控实验室环境中达到了很高的性能水平，在实验室环境中背景噪声和通道变化相当不错。但是，对于实际应用，由于训练条件和操作条件之间的不匹配，ASR系统的性能可能会大大降低。在本文中，我们研究了倒谱域中声学特征的噪声鲁棒性，这些噪声特征已在大多数先进的ASR系统中成功使用。我们试图通过利用不同特征组件的不平等鲁棒性来辨别在多大程度上使识别过程对噪声不敏感。我们的方法既不需要修改声学模型，也不需要前端补偿。动态倒谱特征在表征其时间轨迹时补充了静态特征。众所周知，动态特征的使用改善了语音识别的性能。然而，很少有定量和系统的研究来研究噪声中ASR静态和动态特征的鲁棒性。在这项研究中，通过定量研究静态和动态倒谱特征的噪声鲁棒性，我们发现动态特征比静态特征对噪声更鲁棒。因此，我们通过在解码过程中按指数加权加权静态和动态特征的可能性，提出了一种简单但有效的鲁棒语音识别策略。开发了一种判别训练程序，以使用少量开发数据自动估计最佳特征权重。该方法在两个数字连接的数据库上进行了评估，一个数据库使用英语（Aurora 2），另一个数据库使用广东话（CUDigit）。通过在各种噪声条件下使用特定于条件的权重，可以实现比常规非加权基线识别系统明显的性能提升。 Aurora 2和CUDigit的整体相对单词错误率（WER）减少分别为36.55％和41.92％。所提出的方法在实际应用中具有吸引力，因为：（1）特征补偿不需要噪声估计；（2）不需要使HMM适应嘈杂的环境；（3）仅需要对解码过程进行较小的修改；（4）仅需要训练几个特征权重。（摘要由UMI缩短。）

著录项

作者
Yang, Chen.;
展开▼
作者单位

The Chinese University of Hong Kong (People's Republic of China).;

展开▼
授予单位 The Chinese University of Hong Kong (People's Republic of China).;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2005
页码 141 p.
总页数 141
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Speech perception for adult cochlear implant recipients in a realistic background noise: effectiveness of preprocessing strategies and external options for improving speech recognition in noise. [J] . Gifford RH, Revit LJ Journal of the American Academy of Audiology . 2010,第7期

机译：成年人工耳蜗植入者在逼真的背景噪声中的语音感知：预处理策略的有效性和用于改善噪声中语音识别的外部选项。
2. Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor [J] . Marc DELCROIX, Tomohiro NAKATANI, Shinji WATANABE 電子情報通信学会技術研究報告. 音声. Speech . 2007,第406期

机译：动态特征方差自适应，可通过语音增强预处理器实现健壮的语音识别
3. Dynamic feature variance adaptation for robust speech recognition with a speech enhancement pre-processor [J] . Marc DELCROIX, Tomohiro NAKATANI, Shinji WATANABE 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2007,第405期

机译：动态特征方差自适应，可通过语音增强预处理器实现健壮的语音识别
4. On Factorizing Spectral Dynamics for Robust Speech Recognition [C] . Vivek Tyagi, Iain McCowan, Herve Bourlard, European Conference on Speech Communication and Technology . 2003

机译：关于鲁棒语音识别的分解频谱动力学
5. Local feature extraction for robust speech recognition in the presence of noise. [D] . Tufekci, Zekeriya. 2001

机译：局部特征提取可在存在噪声的情况下实现强大的语音识别。
6. Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：识别消息和使者：仿生频谱分析可增强语音和说话者识别能力
7. Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition [O] . Gonzalez, J.A., Gómez, A.M., Peinado, A.M., 2017

机译：基于掩蔽模型的噪声鲁棒语音识别谱重建与噪声模型估计

On the robustness of static and dynamic spectral information for speech recognition in noise.

摘要

著录项

相似文献

相关主题

期刊订阅