Normalized amplitude modulation features for large vocabulary noise-robust speech recognition

机译：归一化幅度调制功能，用于大词汇量鲁棒语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teager's nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRI's DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features

机译：背景噪声和信道降级严重限制了最新语音识别系统的性能。将人类语音识别性能与自动语音识别系统进行比较的研究表明，与自动系统相比，人类听觉系统在抵御背景噪声和信道变化方面具有很高的鲁棒性。向语音识别系统添加鲁棒性的传统方法是为语音识别模型构建鲁棒的功能集。在这项工作中，我们提出了从Teager的非线性能量算子得到的幅度调制特征，该特征进行了功率归一化和余弦变换以产生归一化调制倒谱系数（NMCC）特征。将拟议的NMCC功能与Aurora-2和经过重新噪点的《华尔街日报》（WSJ）语料库中的最新噪声健壮功能进行了比较。使用SRI的DECIPHER大词汇量语音识别系统，在干净的和人工重新噪点的WSJ语料库上进行了WSJ单词识别实验。实验在三种训练条件下进行：（a）匹配，（b）不匹配，（c）多条件。 Aurora-2数字识别任务是使用与Aurora-2一起分发的标准HTK识别器执行的。我们的结果表明，与MFCC相比，拟议的NMCC功能在经过重新噪点的WSJ数据的几乎所有训练测试条件下均表现出噪声鲁棒性，并且还改善了Aurora-2的数字识别精度，并提供了最新的鲁棒性功能

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP》|2012年|p.4117- 4120|共4页
会议地点 Kyoto(JP)
作者
Mitra, Vikramjit;
展开▼
作者单位

Speech Technology and Research Laboratory SRI International Menlo Park CA USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Low-Complexity Parabolic Lip Contour Model With Speaker Normalization for High-Level Feature Extraction in Noise-Robust Audiovisual Speech Recognition [J] . Borgstrom B.J., Alwan A. IEEE transactions on systems, man, and cybernetics. Part A, Systems and humans . 2008,第6期

机译：具有说话人归一化功能的低复杂度抛物线形嘴唇轮廓模型，用于噪声鲁棒的视听语音识别中的高级特征提取
2. Temporal modulation normalization for robust speech feature extraction and recognition [J] . Xugang Lu, Shigeki Matsuda, Masashi Unoki, Multimedia Tools and Applications . 2011,第1期

机译：时间调制归一化，用于鲁棒的语音特征提取和识别
3. Speech emotion recognition using amplitude modulation parameters and a combined feature selection procedure [J] . Arianna Mencattini, Eugenio Martinelli, Giovanni Costantini, Knowledge-Based Systems . 2014,第juna期

机译：使用幅度调制参数和组合特征选择过程的语音情感识别
4. Normalized amplitude modulation features for large vocabulary noise-robust speech recognition [C] . Mitra V., Franco H., Graciarena M., IEEE International Conference on Acoustics, Speech and Signal Processing . 2011

机译：大型词汇噪声稳健语音识别的归一化幅度调制特征
5. Duration normalization for robust recognition of spontaneous speech via missing feature methods. [D] . Nedel, Jon P. 2004

机译：持续时间归一化，可通过缺失特征方法对自发语音进行可靠识别。
6. Speech recognition with amplitude and frequency modulations [O] . Fan-Gang Zeng, Kaibao Nie, Ginger S. Stickney, 2005

机译：具有幅度和频率调制的语音识别
7. Normalized amplitude modulation features for large vocabulary noise-robust speech recognition [O] . Vikramjit Mitra, Horacio Franco, Martin Graciarena, 2012

机译：归一化幅度调制特征用于大词汇量噪声 - 鲁棒语音识别
8. Normalized Amplitude Modulation Features for Large Vocabulary Noise- Robust Speech Recognition. [R] . Mitra, V., Franco, H., Graciarena, M., 2012

机译：用于大词汇量噪声 - 鲁棒语音识别的归一化幅度调制特征。

Normalized amplitude modulation features for large vocabulary noise-robust speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅