MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

Gonzalez J. A.; Peinado A. M.; Ma N.; Gomez A. M.; Barker J.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

【24h】

MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

机译：基于MMSE的缺失特征重建与时间建模，用于鲁棒语音识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper addresses the problem of feature compensation in the log-spectral domain by using the missing-data (MD) approach to noise robust speech recognition, that is, the log-spectral features can be either almost unaffected by noise or completely masked by it. First, a general MD framework based on minimum mean square error (MMSE) estimation is introduced which exploits the correlation across frequency bands to reconstruct the missing features. This framework allows the derivation of different MD imputation approaches and, in particular, a novel technique taking advantage of truncated Gaussian distributions is presented. While the proposed technique provides excellent results at high and medium signal-to-noise ratios (SNRs), its performance diminishes at low SNRs where very few reliable features are available. The reconstruction technique is therefore extended to exploit temporal constraints using two different approaches. In the first approach, time-frequency patches of speech containing a number of consecutive frames are modeled using a Gaussian mixture model (GMM). In the second one, the sequential structure of speech is alternatively modeled by a hidden Markov model (HMM). The proposed techniques are evaluated on Aurora-2 and Aurora-4 databases using both oracle and estimated masks. In both cases, the proposed techniques outperform the recognition performance obtained by the baseline system and other related techniques. Also, the introduction of a temporal modeling turns out to be very effective in reconstructing spectra at low SNRs. In particular, HMMs show the highest capability of accounting for time correlations and, therefore, achieve the best results.

机译：本文通过使用缺失数据（MD）方法对噪声稳健的语音识别来解决对数谱域中的特征补偿问题，即对数谱特征几乎不受噪声影响或被噪声完全掩盖。首先，引入了基于最小均方误差（MMSE）估计的通用MD框架，该框架利用跨频带的相关性来重构缺失的特征。该框架允许推导不同的MD插补方法，尤其是提出了一种利用截断的高斯分布的新颖技术。虽然所提出的技术在高和中等信噪比（SNR）时提供了出色的结果，但在低SNR时（几乎没有可靠的功能），其性能会下降。因此，重构技术被扩展为使用两种不同的方法来利用时间约束。在第一种方法中，使用高斯混合模型（GMM）对包含多个连续帧的语音时频补丁进行建模。在第二篇文章中，语音的顺序结构可以通过隐马尔可夫模型（HMM）进行建模。使用oracle和估计掩码在Aurora-2和Aurora-4数据库上评估了建议的技术。在这两种情况下，所提出的技术均优于基线系统和其他相关技术所获得的识别性能。而且，在重建低SNR的频谱时，时间建模的引入非常有效。特别是，HMM显示出最高的时间相关性，因此可获得最佳结果。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第3期|p.624-635|共12页
作者
Gonzalez J. A.; Peinado A. M.; Ma N.; Gomez A. M.; Barker J.;
展开▼
作者单位

Department of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Granada, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Correlation; Covariance matrix; Estimation; Hidden Markov models; Noise; Reliability; Speech; Minimum mean square error estimation; missing-feature; robust speech recognition; spectral reconstruction;

机译：相关性协方差矩阵估计;隐藏的马尔可夫模型;噪声;可靠性;言语;最小均方误差估计;功能缺失强大的语音识别;频谱重建;

相似文献

外文文献
中文文献
专利

1. Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions [J] . Kim W., Hansen J. H. L. Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第7期

机译：基于时频相关的丢失特征重建在频带受限条件下的鲁棒语音识别
2. Parameter Tuning-Free Missing-Feature Reconstruction for Robust Sound Recognition [J] . Liu Qi, Wu Jibin Selected Topics in Signal Processing, IEEE Journal of . 2021,第1期

机译：用于稳健声音识别的参数可调缺失功能重建
3. Spectral Reconstruction and Noise Model Estimation Based on a Masking Model for Noise Robust Speech Recognition [J] . Gonzalez Jose A., Gomez Angel M., Peinado Antonio M., Circuits, systems, and signal processing . 2017,第9期

机译：基于掩蔽模型的谱重构和噪声模型估计，用于噪声鲁棒语音识别
4. Time-frequency correlation based missing-feature reconstruction for robust speech recognition in background noise conditions [C] . Kim Wooil, Hansen John H.L. Asilomar Conference on Signals, Systems and Computers . 2009

机译：基于时频相关的缺失特征重建在背景噪声条件下的鲁棒语音识别
5. Reconstruction of incomplete spectrograms for robust speech recognition. [D] . Ramakrishnan, Bhiksha Raj. 2000

机译：重构不完整的频谱图以增强语音识别能力。
6. Sublexical Properties of Spoken Words Modulate Activity in Broca’s Area but Not Superior Temporal Cortex: Implications for Models of Speech Recognition [O] . Kenneth I. Vaden Jr., Tepring Piquado, Gregory Hickok -1

机译：在布罗卡区而不是高级颞叶皮质口语词调节活动的形旁亚词汇性质：对语音识别的模式
7. ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION [O] . Arun Narayanan, Xiaojia Zhao, Deliang Wang, 2013

机译：使用多种先前模型进行语音重建的鲁棒语音识别

MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅