首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition
【24h】

Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition

机译:耦合字典用于基于示例的语音增强和自动语音识别

获取原文
获取原文并翻译 | 示例

摘要

Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.
机译:基于示例的语音增强系统通过将嘈杂的语音分解为存储在词典中的语音和噪声示例的加权和来工作,并使用所得的语音和噪声估计来获得全分辨率频域中随时间变化的滤波器以增强嘈杂的演讲。为了获得分解,在低维空间中采样的样本比全分辨率频域更可取,因为它们降低了计算复杂度,并且能够更好地推广到未发现的情况。但是,由于获得的语音和噪声估计到全分辨率频域的映射会产生低秩近似,因此所得的滤波器可能不是最佳的。本文提出了一种有效的方法,可以使用耦合字典直接计算语音和噪声的全分辨率频率估计:输入字典包含期望示例空间中的原子以获得分解,耦合字典包含包含来自全分辨率频率中的样本域。我们还使用这种方法为基于示例的任务介绍了调制频谱图功能。对提出的系统进行了各种输入示例选择的评估,并在AURORA-2和AURORA-4数据库上产生了改进的语音增强性能。我们进一步证明,所提出的方法还可以提高基于HMM-GMM和基于深度神经网络(DNN)的语音识别任务的单词错误率(WER)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号