首页> 外文期刊>Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on >Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition
【24h】

Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition

机译:贝叶斯分离与稀疏度提升的感知小波域中的语音增强和混合语音识别

获取原文
           

摘要

Speech recognition accuracy can be improved by the removal of noise. However, errors in the estimated signal components can also obscure the recognition. This paper presents a framework of wavelet-based techniques to harness the automatic speech recognition performance in the presence of background noise. The proposed robust speech recognition system is realized by implementing speech enhancement preprocessing, feature extraction, and a hybrid speech recognizer in the time–frequency space. A perceptual wavelet filterbank using a fixed base to imitate the human perceptual modus of speech is developed to capture the most discriminative information in the time–frequency plane. To minimize the mismatch between the training and testing conditions of the classifier, a Bayesian scheme is applied in a wavelet domain to separate the speech and noise components in the proposed iterative speech enhancement algorithm. The nonphonetic information is discarded while the more critical speech features are extracted and represented by the wavelet coefficients. The denoised wavelet features are fed to the hybrid classifier founded on a hidden Markov model (HMM). The intrinsic limitation of the HMM is overcome by augmenting it with a wavelet support vector machine. This hybrid and hierarchical design paradigm improves the recognition performance by combining the advantages of different methods into an integral system. The continuous digit speech recognition experiments conducted with the proposed framework show promising results. It significantly improves the recognition performance at a low signal-to-noise ratio (SNR) without causing a poorer performance at a high SNR.
机译:通过去除噪声可以提高语音识别的准确性。但是,估计信号分量中的错误也会使识别模糊。本文提出了一种基于小波的技术框架,以在存在背景噪声的情况下利用自动语音识别性能。所提出的鲁棒语音识别系统是通过在时频空间中实现语音增强预处理,特征提取和混合语音识别器来实现的。开发了一种使用固定碱基来模仿人类感知语音方式的感知小波滤波器组,以捕获时频平面中最具判别力的信息。为了最小化分类器的训练条件和测试条件之间的不匹配,在提出的迭代语音增强算法中,在小波域中应用贝叶斯方案来分离语音和噪声分量。非语音信息被丢弃,而更关键的语音特征被提取并由小波系数表示。去噪的小波特征被馈送到基于隐马尔可夫模型(HMM)的混合分类器。 HMM的固有局限性是通过使用小波支持向量机对其进行增强来克服的。通过将不同方法的优点组合到一个完整的系统中,这种混合和分层设计范例提高了识别性能。提出的框架进行的连续数字语音识别实验显示出令人鼓舞的结果。它可显着提高低信噪比(SNR)时的识别性能,而不会导致高SNR时性能变差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号