Normalization of the Speech Modulation Spectra for Robust Speech Recognition

Xiong Xiao; Eng Siong Chng; Haizhou Li

首页> 外文期刊>IEEE transactions on audio, speech and language processing >Normalization of the Speech Modulation Spectra for Robust Speech Recognition

【24h】

Normalization of the Speech Modulation Spectra for Robust Speech Recognition

机译：语音调制谱的归一化以实现可靠的语音识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we study a novel technique that normalizes the modulation spectra of speech signals for robust speech recognition. The modulation spectra of a speech signal are the power spectral density (PSD) functions of the feature trajectories generated from the signal, hence they describe the temporal structure of the features. The modulation spectra are distorted when the speech signal is corrupted by noise. We propose the temporal structure normalization (TSN) filter to reduce the noise effects by normalizing the modulation spectra to reference spectra. The TSN filter is different from other feature normalization methods such as the histogram equalization (HEQ) that only normalize the probability distributions of the speech features. Our previous work showed promising results of TSN on a small vocabulary Aurora-2 task. In this paper, we conduct an inquiry into the theoretical and practical issues of the TSN filter that includes the following. 1) We investigate the effects of noises on the speech modulation spectra and show the general characteristics of noisy speech modulation spectra. The observations help to further explain and justify the TSN filter. 2) We evaluate the TSN filter on the Aurora-4 task and demonstrate its effectiveness for a large vocabulary task. 3) We propose a segment-based implementation of the TSN filter that reduces the processing delay significantly without affecting the performance. Overall, the TSN filter produces significant improvements over the baseline systems, and delivers competitive results when compared to other state-of-the-art temporal filters.

机译：在本文中，我们研究了一种新颖的技术，该技术可以对语音信号的调制频谱进行归一化以增强语音识别能力。语音信号的调制频谱是从信号生成的特征轨迹的功率谱密度（PSD）函数，因此它们描述了特征的时间结构。当语音信号被噪声破坏时，调制频谱就会失真。我们提出了时间结构归一化（TSN）滤波器，通过将调制频谱归一化为参考频谱来减少噪声影响。 TSN滤波器与其他特征归一化方法（例如直方图均衡化（HEQ））不同，直方图均等化（HEQ）仅对语音特征的概率分布进行归一化。我们以前的工作显示了TSN在Aurora-2小词汇量任务上的可喜结果。在本文中，我们对TSN滤波器的理论和实践问题进行了调查，其中包括以下内容。 1）我们研究了噪声对语音调制频谱的影响，并显示了噪声语音调制频谱的一般特征。这些观察结果有助于进一步解释和证明TSN过滤器。 2）我们评估Aurora-4任务上的TSN过滤器，并证明其对大型词汇任务的有效性。 3）我们提出了TSN过滤器的基于段的实现，该实现在不影响性能的情况下显着减少了处理延迟。总的来说，与其他最新的时间滤波器相比，TSN滤波器在基线系统上进行了重大改进，并提供了具有竞争力的结果。

著录项

来源
《IEEE transactions on audio, speech and language processing 》 |2008年第8期| p.1662-1674| 共13页
作者
Xiong Xiao; Eng Siong Chng; Haizhou Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ;
关键词
modulation; speech recognition; statistical distributions; histogram equalization; power spectral density functions; probability distributions; robust speech recognition; speech features; speech modulation spectra; temporal structure normalization; Aurora task; fea;

机译：调制;语音识别;统计分布;直方图均衡;功率谱密度函数;概率分布;鲁棒语音识别;语音特征;语音调制谱;时间结构归一化;Aurora任务;特征;

相似文献

外文文献
中文文献
专利

1. Temporal modulation normalization for robust speech feature extraction and recognition [J] . Xugang Lu, Shigeki Matsuda, Masashi Unoki, Multimedia Tools and Applications . 2011 ,第1期

机译：时间调制归一化，用于鲁棒的语音特征提取和识别
2. Speaker normalized spectral subband parameters for noise robust speech recognition [J] . Satoru Tsuge, Toshiaki Fukada, Harald Singer, The Journal of the Acoustical Society of Japan . 1999 ,第6期

机译：扬声器归一化频谱子带参数，用于噪声鲁棒的语音识别
3. Temporal Structure Normalization of Speech Feature for Robust Speech Recognition [J] . Xiao X., Chng E. S., Li H. IEEE signal processing letters . 2007 ,第7期

机译：语音特征的时态结构归一化，用于鲁棒语音识别
4. The study of q-logarithmic modulation spectral normalization for robust speech recognition [C] . Fan Hao-teng, Hsu Che-hsien, Hung Jeih-weih System Science and Engineering (ICSSE), 2012 International Conference on . 2012

机译：q对数调制谱归一化用于鲁棒语音识别的研究
5. Compressive nonlinearity for representing speech spectral magnitude to improve noise robustness of automatic speech recognition . [D] . Wong, Brian. 2011

机译：压缩非线性表示语音频谱幅度提高语音自动识别的鲁棒性。
6. Comparing auditory filter bandwidths spectral ripple modulation detection spectral ripple discrimination and speech recognition: Normal and impaired hearing [O] . Evelyn Davies-Venn, b), Peggy Nelson, -1

机译：比较听觉滤波器的带宽频谱纹波调制检测频谱纹波鉴别和语音识别：听力正常和受损
7. Normalized amplitude modulation features for large vocabulary noise-robust speech recognition [O] . Vikramjit Mitra, Horacio Franco, Martin Graciarena, 2012

机译：归一化幅度调制特征用于大词汇量噪声 - 鲁棒语音识别
8. Normalized Amplitude Modulation Features for Large Vocabulary Noise- Robust Speech Recognition. [R] . Mitra, V., Franco, H., Graciarena, M., 2012

机译：用于大词汇量噪声 - 鲁棒语音识别的归一化幅度调制特征。

Normalization of the Speech Modulation Spectra for Robust Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅