首页> 外文学位 >Noise robust front-end processing for automatic speech recognition.
【24h】

Noise robust front-end processing for automatic speech recognition.

机译:用于自动语音识别的强大的抗噪前端处理。

获取原文
获取原文并翻译 | 示例

摘要

The performance of current automatic speech recognition (ASR) systems degrades greatly under noise. This dissertation focuses on the front-end approach to improving the noise robustness of ASR systems. Several novel algorithms are developed for feature extraction.; The first algorithm is variable frame rate analysis, which is inspired by human speech perception. It uses a high frame rate for rapidly-changing segments of high energy and a low frame rate for relatively steady segments.; An analysis-based non-linear feature extraction approach is proposed inspired by a quantitative model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noise-robust parts of speech spectra without losing discriminative information. Two nonlinear processing algorithms, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation (Strope & Alwan, 1997), is also discussed with this model. These algorithms do not require noise estimation and are effective in dealing with both stationary and non-stationary noise backgrounds. A noise removal algorithm derived directly from the additive noise model is also tested and compared with the other new algorithms in this dissertation and with the linear and nonlinear spectral subtraction methods.; The proposed front-end processing algorithms are tested in Hidden Markov Model (HMM) based speech recognition experiments with the TI46 database and the Aurora 2 database. Significant improvement is observed by using these algorithms. For the TI46 isolated digits database, the average recognition rate across SNRs is improved from 60% (for the widely-used MFCC front-end) to 95% (using the proposed techniques) in the presence of additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs is improved from 58% to 83%.; Finally, a DCT-based feature-coding scheme is proposed for distributed speech recognition. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, run-length and Huffman coding. Analysis and recognition experiments show that the 2D DCT can be an effective way in exploiting inter-frame correlation of acoustic features.
机译:当前的自动语音识别(ASR)系统的性能在噪声下会大大降低。本文的重点是提高ASR系统的噪声鲁棒性的前端方法。开发了几种新颖的特征提取算法。第一种算法是可变帧频分析,该算法受人类语音感知的启发。对于高能量的快速变化段,它使用高帧速率;对于相对稳定的段,它使用低帧速率。提出了一种基于分析的非线性特征提取方法,该方法受定量模型影响的语音幅度频谱如何受到加性噪声影响的启发。基于语音频谱的鲁棒性部分提取声学特征,而不会丢失判别信息。设计了两种非线性处理算法,即谐波解调和频谱峰谷比锁定,以最小化干净语音和嘈杂语音之间的不匹配。该模型还讨论了一种以前研究的方法,即峰隔离(Strope&Alwan,1997)。这些算法不需要噪声估计,并且在处理固定和非固定噪声背景时都很有效。还测试了直接从加性噪声模型导出的噪声消除算法,并将其与本论文中的其他新算法以及线性和非线性频谱减法相比较。 TI46数据库和Aurora 2数据库在基于隐马尔可夫模型(HMM)的语音识别实验中测试了提出的前端处理算法。通过使用这些算法,可以观察到显着的改进。对于TI46孤立数字数据库,在存在附加语音形噪声的情况下,跨SNR的平均识别率从60%(对于广泛使用的MFCC前端)提高到95%(使用提出的技术)。对于Aurora 2连接的数字字符串数据库,包括非平稳噪声背景和SNR在内的不同噪声类型的平均识别率从58%提高到83%。最后,提出了一种基于DCT的特征编码方案,用于分布式语音识别。编码方案涉及在特征向量的块上计算2D DCT,然后进行统一的标量量化,游程长度和霍夫曼编码。分析和识别实验表明,二维DCT可以有效地利用声学特征的帧间相关性。

著录项

  • 作者

    Zhu, Qifeng.;

  • 作者单位

    University of California, Los Angeles.;

  • 授予单位 University of California, Los Angeles.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 171 p.
  • 总页数 171
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号