Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

Kim W.; Hansen J. H. L.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

【24h】

Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

机译：基于时频相关的丢失特征重建在频带受限条件下的鲁棒语音识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Band-limited speech represents one of the most challenging factors for robust speech recognition. This is especially true in supporting audio corpora from sources that have a range of conditions in spoken document retrieval requiring effective automatic speech recognition. The missing-feature reconstruction method has a problem when applied to band-limited speech reconstruction, since it assumes the observations in the unreliable regions are always greater than the latent original clean speech. The approach developed here depends only on reliable components to calculate the posterior probability to mitigate the problem. This study proposes an advanced method to effectively utilize the correlation information of the spectral components across time and frequency axes in an effort to increase the performance of missing-feature reconstruction in band-limited conditions. We employ an F1 Area Window and Cutoff Border Window in order to include more knowledge on reliable components which are highly correlated with the cutoff frequency band. To detect the cutoff regions for missing-feature reconstruction, blind mask estimation is also presented, which employs the synthesized band-limited speech model without secondary training data. Experiments to evaluate the performance of the proposed methods are accomplished using the SPHINX3 speech recognition engine and the TIMIT corpus. Experimental results demonstrate that the proposed time–frequency (TF) correlation based missing-feature reconstruction method is significantly more effective in improving band-limited speech recognition accuracy. By employing the proposed TF-missing feature reconstruction method, we obtain up to 14.61% of average relative improvement in word error rate (WER) for four available bandwidths with cutoff frequencies 1.0, 1.5, 2.0, and 2.5 kHz, respectively, compared to earlier formulated methods. Exper-ni

机译：限带语音代表强大语音识别的最具挑战性因素之一。在支持语音文档检索中具有一系列条件且需要有效自动语音识别的来源的音频语料库中尤其如此。缺失特征重建方法在应用于带限语音重建时存在问题，因为它假设在不可靠区域中的观测值始终大于潜在的原始纯净语音。这里开发的方法仅依赖于可靠的组件来计算后验概率以缓解问题。这项研究提出了一种先进的方法，可以有效地利用时间轴和频率轴上频谱分量的相关信息，以提高频带受限条件下缺失特征重建的性能。我们使用F1区域窗口和截止边界窗口，以包括与截止频带高度相关的可靠组件的更多知识。为了检测用于缺失特征重建的截止区域，还提出了盲掩模估计，该盲掩模估计采用了合成的带限语音模型而没有二次训练数据。使用SPHINX3语音识别引擎和TIMIT语料库完成了评估所提出方法性能的实验。实验结果表明，所提出的基于时频（TF）相关的缺失特征重构方法在提高带限语音识别精度方面更为有效。通过采用建议的TF缺失特征重构方法，与早期相比，我们分别获得了截止频率分别为1.0、1.5、2.0和2.5 kHz的四个可用带宽的字错误率（WER）的平均平均相对提高高达14.61％制定方法。专家

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2009年第7期|p.1292-1304|共13页
作者
Kim W.; Hansen J. H. L.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Band-limited speech; correlation; missing-feature; speech recognition; time–frequency (TF);

机译：带限语音;相关性;缺失特征;语音识别;时频（TF）;

相似文献

外文文献
中文文献
专利

1. MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition [J] . Gonzalez J. A., Peinado A. M., Ma N., Audio, Speech, and Language Processing, IEEE Transactions on . 2013,第3期

机译：基于MMSE的缺失特征重建与时间建模，用于鲁棒语音识别
2. Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach [J] . Seyed Reza Shahamiri, Siti Salwah Binti Salim Neurocomputing . 2014,第apra10期

机译：基于多网络人工神经网络的实时基于频率的鲁棒性自动语音识别：多视图多学习者方法
3. Binary and ratio time-frequency masks for robust speech recognition [J] . Soundararajan Srinivasan, Nicoleta Roman, DeLiang Wang Speech Communication . 2006,第11期

机译：二进制和比率时频掩码，可实现可靠的语音识别
4. Time-frequency correlation based missing-feature reconstruction for robust speech recognition in background noise conditions [C] . Kim Wooil, Hansen John H.L. Asilomar Conference on Signals, Systems and Computers . 2009

机译：基于时频相关的缺失特征重建在背景噪声条件下的鲁棒语音识别
5. Multi-microphone correlation-based processing for robust automatic speech recognition. [D] . Sullivan, Thomas M. 1996

机译：基于多麦克风相关性的处理可实现强大的自动语音识别。
6. Recognition of speech in noise after application of time-frequency masks: Dependence on frequency and threshold parameters [O] . Donal G. Sinex -1

机译：应用时频模板后噪声中的语音识别：取决于频率和阈值参数
7. 4 Time-Frequency Masking: Linking Blind Source Separation and Robust Speech Recognition [O] . Marco Kühne, Roberto Togneri, Sven Nordholm 2015

机译：4时频掩蔽：链接盲源分离和鲁棒语音识别

Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅