首页> 外文会议> >Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques
【24h】

Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques

机译:使用时频掩蔽和丢失数据技术对嘈杂的卷积语音混合物进行分离和鲁棒性识别

获取原文

摘要

Time-frequency masking has emerged as a powerful technique for source separation of noisy and convolved speech mixtures. It has also been applied successfully for noisy speech recognition. But while significant SNR gains are possible by adequate masking functions, speech recognition performance suffers from the involved nonlinear operations so that the greatly improved SNR often contrasts with only slight improvements in the recognition rate. To address this problem, marginalization techniques have been used for speech recognition, but they rely on speech recognition and source separation to be carried out in the same domain. However, source separation and denoising are often carried out in the short-time-Fourier-transform (STFT) domain, whereas the most useful speech recognition features are e.g. mel-frequency cepstral coefficients (MFCCs), LPC-cepstral coefficients and VQ-features. In these cases, marginalization techniques are not directly applicable. Here, another approach is suggested, which estimates sufficient statistics for speech features in the preprocessing (e.g. STFT-) domain, propagates these statistics through the transforms from the spectrum to e.g. the MFCC's of a speech recognition system and uses the estimated statistics for missing data speech recognition. With this approach, significant gains can be achieved in speech recognition rates, and in this context, time-frequency masking yields recognition rate improvements of more than 35% when compared to TF-masking based source separation.
机译:时频掩蔽已成为一种强大的技术,可用于分离嘈杂和卷积的语音混合物。它也已成功地应用于嘈杂的语音识别。但是,尽管通过适当的屏蔽功能可以显着提高SNR,但语音识别性能会受到所涉及的非线性操作的影响,因此,大大提高的SNR通常仅与识别率的轻微提高形成对比。为了解决这个问题,边缘化技术已经用于语音识别,但是它们依赖语音识别和源分离来在同一域中进行。然而,源分离和去噪通常在短时傅立叶变换(STFT)域中进行,而最有用的语音识别特征是例如。梅尔频率倒谱系数(MFCC),LPC倒谱系数和VQ功能。在这些情况下,边缘化技术不直接适用。这里,提出了另一种方法,其估计预处理(例如,STFT-)域中的语音特征的足够统计量,通过从频谱到例如频谱的变换来传播这些统计量。语音识别系统的MFCC,并使用估计的统计信息进行数据丢失语音识别。通过这种方法,可以在语音识别率上获得显着的收益,并且在这种情况下,与基于TF掩码的源分离相比,时频掩蔽的识别率提高了35%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号