首页> 外文期刊>Computer speech and language >Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement
【24h】

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

机译:线性预测器和谐波噪声模型的卡尔曼跟踪,用于增强语音噪声

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the 'musical noise' or 'musical tones'. The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames. The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics' amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters. The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages.
机译:本文提出了一种基于语音频谱包络线性预测(LP)模型的共振峰及其激励的谐波噪声模型(HNM)参数的跟踪和去噪的语音增强方法。跟踪和去噪突出的语音能量轮廓的主要优点是有效利用了连续语音帧的频谱和时间结构,并减轻了被称为“音乐噪声”或“音乐音调”的处理伪像。共振峰跟踪线性预测(FTLP)模型估计包括三个阶段:(a)基于频谱幅度估计的语音预清洗;(b)使用维特比方法跨连续语音帧进行共振峰跟踪;以及(c)卡尔曼连续语音帧中共振峰轨迹的滤波。激励信号的HNM参数包括:浊音/清音决策,基频,谐波幅度和激励噪声分量的方差。提出了一种频域基音提取方法,该方法搜索谐波处的峰值信噪比(SNR)。对于每个语音帧,计算几个音调候选。使用维特比(Viterbi)解码器可以获得连续帧中音高轨迹的估计。使用卡尔曼滤波器对连续语音帧中的噪声激励谐波的轨迹进行建模和去噪。所提出的方法用于对嘈杂的语音进行解构,对模型参数进行消噪,然后从其净化后的部分重构语音。实验评估显示了共振峰跟踪,音调提取和降噪阶段的性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号