首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions
【24h】

Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

机译:基于时频相关的丢失特征重建在频带受限条件下的鲁棒语音识别

获取原文
获取原文并翻译 | 示例
       

摘要

Band-limited speech represents one of the most challenging factors for robust speech recognition. This is especially true in supporting audio corpora from sources that have a range of conditions in spoken document retrieval requiring effective automatic speech recognition. The missing-feature reconstruction method has a problem when applied to band-limited speech reconstruction, since it assumes the observations in the unreliable regions are always greater than the latent original clean speech. The approach developed here depends only on reliable components to calculate the posterior probability to mitigate the problem. This study proposes an advanced method to effectively utilize the correlation information of the spectral components across time and frequency axes in an effort to increase the performance of missing-feature reconstruction in band-limited conditions. We employ an F1 Area Window and Cutoff Border Window in order to include more knowledge on reliable components which are highly correlated with the cutoff frequency band. To detect the cutoff regions for missing-feature reconstruction, blind mask estimation is also presented, which employs the synthesized band-limited speech model without secondary training data. Experiments to evaluate the performance of the proposed methods are accomplished using the SPHINX3 speech recognition engine and the TIMIT corpus. Experimental results demonstrate that the proposed time–frequency (TF) correlation based missing-feature reconstruction method is significantly more effective in improving band-limited speech recognition accuracy. By employing the proposed TF-missing feature reconstruction method, we obtain up to 14.61% of average relative improvement in word error rate (WER) for four available bandwidths with cutoff frequencies 1.0, 1.5, 2.0, and 2.5 kHz, respectively, compared to earlier formulated methods. Exper-ni
机译:限带语音代表强大语音识别的最具挑战性因素之一。在支持语音文档检索中具有一系列条件且需要有效自动语音识别的来源的音频语料库中尤其如此。缺失特征重建方法在应用于带限语音重建时存在问题,因为它假设在不可靠区域中的观测值始终大于潜在的原始纯净语音。这里开发的方法仅依赖于可靠的组件来计算后验概率以缓解问题。这项研究提出了一种先进的方法,可以有效地利用时间轴和频率轴上频谱分量的相关信息,以提高频带受限条件下缺失特征重建的性能。我们使用F1区域窗口和截止边界窗口,以包括与截止频带高度相关的可靠组件的更多知识。为了检测用于缺失特征重建的截止区域,还提出了盲掩模估计,该盲掩模估计采用了合成的带限语音模型而没有二次训练数据。使用SPHINX3语音识别引擎和TIMIT语料库完成了评估所提出方法性能的实验。实验结果表明,所提出的基于时频(TF)相关的缺失特征重构方法在提高带限语音识别精度方面更为有效。通过采用建议的TF缺失特征重构方法,与早期相比,我们分别获得了截止频率分别为1.0、1.5、2.0和2.5 kHz的四个可用带宽的字错误率(WER)的平均平均相对提高高达14.61%制定方法。专家

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号