Noise robust front-end processing for automatic speech recognition.

机译：用于自动语音识别的强大的抗噪前端处理。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The performance of current automatic speech recognition (ASR) systems degrades greatly under noise. This dissertation focuses on the front-end approach to improving the noise robustness of ASR systems. Several novel algorithms are developed for feature extraction.; The first algorithm is variable frame rate analysis, which is inspired by human speech perception. It uses a high frame rate for rapidly-changing segments of high energy and a low frame rate for relatively steady segments.; An analysis-based non-linear feature extraction approach is proposed inspired by a quantitative model of how speech amplitude spectra are affected by additive noise. Acoustic features are extracted based on the noise-robust parts of speech spectra without losing discriminative information. Two nonlinear processing algorithms, harmonic demodulation and spectral peak-to-valley ratio locking, are designed to minimize mismatch between clean and noisy speech features. A previously studied method, peak isolation (Strope & Alwan, 1997), is also discussed with this model. These algorithms do not require noise estimation and are effective in dealing with both stationary and non-stationary noise backgrounds. A noise removal algorithm derived directly from the additive noise model is also tested and compared with the other new algorithms in this dissertation and with the linear and nonlinear spectral subtraction methods.; The proposed front-end processing algorithms are tested in Hidden Markov Model (HMM) based speech recognition experiments with the TI46 database and the Aurora 2 database. Significant improvement is observed by using these algorithms. For the TI46 isolated digits database, the average recognition rate across SNRs is improved from 60% (for the widely-used MFCC front-end) to 95% (using the proposed techniques) in the presence of additive speech-shaped noise. For the Aurora 2 connected digit-string database, the average recognition rate across different noise types, including non-stationary noise background, and SNRs is improved from 58% to 83%.; Finally, a DCT-based feature-coding scheme is proposed for distributed speech recognition. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, run-length and Huffman coding. Analysis and recognition experiments show that the 2D DCT can be an effective way in exploiting inter-frame correlation of acoustic features.

机译：当前的自动语音识别（ASR）系统的性能在噪声下会大大降低。本文的重点是提高ASR系统的噪声鲁棒性的前端方法。开发了几种新颖的特征提取算法。第一种算法是可变帧频分析，该算法受人类语音感知的启发。对于高能量的快速变化段，它使用高帧速率；对于相对稳定的段，它使用低帧速率。提出了一种基于分析的非线性特征提取方法，该方法受定量模型影响的语音幅度频谱如何受到加性噪声影响的启发。基于语音频谱的鲁棒性部分提取声学特征，而不会丢失判别信息。设计了两种非线性处理算法，即谐波解调和频谱峰谷比锁定，以最小化干净语音和嘈杂语音之间的不匹配。该模型还讨论了一种以前研究的方法，即峰隔离（Strope＆Alwan，1997）。这些算法不需要噪声估计，并且在处理固定和非固定噪声背景时都很有效。还测试了直接从加性噪声模型导出的噪声消除算法，并将其与本论文中的其他新算法以及线性和非线性频谱减法相比较。 TI46数据库和Aurora 2数据库在基于隐马尔可夫模型（HMM）的语音识别实验中测试了提出的前端处理算法。通过使用这些算法，可以观察到显着的改进。对于TI46孤立数字数据库，在存在附加语音形噪声的情况下，跨SNR的平均识别率从60％（对于广泛使用的MFCC前端）提高到95％（使用提出的技术）。对于Aurora 2连接的数字字符串数据库，包括非平稳噪声背景和SNR在内的不同噪声类型的平均识别率从58％提高到83％。最后，提出了一种基于DCT的特征编码方案，用于分布式语音识别。编码方案涉及在特征向量的块上计算2D DCT，然后进行统一的标量量化，游程长度和霍夫曼编码。分析和识别实验表明，二维DCT可以有效地利用声学特征的帧间相关性。

著录项

作者
Zhu, Qifeng.;
展开▼
作者单位

University of California, Los Angeles.;

展开▼
授予单位 University of California, Los Angeles.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2001
页码 171 p.
总页数 171
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition [J] . Kentaro Ishizuka, Tomohiro Nakatani Speech Communication . 2006,第11期

机译：一种基于子带的周期性和非周期性分解与噪声鲁棒前端处理的特征提取方法，用于自动语音识别
2. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [J] . Xugang Lu, Masashi Unoki, Masato Akagi Acoustical science and technology . 2008,第6期

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器
3. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [J] . Masashi Unoki, Masato Akagi, Xugang Lu Acoustical science and technology . 2008,第6期

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器
4. Incorporating a Generative Front-end Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition [C] . Souvik Kundu, Khe Chai Sim, Mark Gales Annual Conference of the International Speech Communication Association . 2016

机译：将生成前端层纳入深神经网络，用于噪声鲁棒自动语音识别
5. Multi-microphone correlation-based processing for robust automatic speech recognition. [D] . Sullivan, Thomas M. 1996

机译：基于多麦克风相关性的处理可实现强大的自动语音识别。
6. A Low-Noise Modular and Versatile Analog Front-End Intended for Processing In Vitro Neuronal Signals Detected by Microelectrode Arrays [O] . Giulia Regalia, Emilia Biffi, Giancarlo Ferrigno, 2015

机译：低噪声模块化和多功能模拟前端旨在处理微电极阵列检测到的体外神经元信号。
7. Front-End Post-Processing Using Histogram Equalization Combined with ARMA Filtering for Noise Robust Speech Recognition [O] . Shariati Seyedeh Saloomeh, Ahadi Mohammad, Mohammadi Karim 2007

机译：直方图均衡与ARMA滤波相结合的前端后处理，用于噪声鲁棒的语音识别
8. Normalized Amplitude Modulation Features for Large Vocabulary Noise- Robust Speech Recognition. [R] . Mitra, V., Franco, H., Graciarena, M., 2012

机译：用于大词汇量噪声 - 鲁棒语音识别的归一化幅度调制特征。

Noise robust front-end processing for automatic speech recognition.

摘要

著录项

相似文献

相关主题

期刊订阅