首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique
【24h】

Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique

机译:利用累积峰频谱和稀疏估计技术估计嘈杂语音中的基音

获取原文
获取原文并翻译 | 示例

摘要

Pitch estimation from acoustic signals is a fundamental problem in many areas of speech research. For noise-corrupted speech, reliable pitch estimation is difficult. This paper presents a study of pitch estimation in noisy speech based on robust temporal-spectral representation and sparse reconstruction. We propose to accumulate spectral peaks over consecutive time frames. Since harmonic structure of speech changes much more slowly than noise spectrum, spectral peaks related to pitch harmonics would stand out over the noise through the accumulation. Experimental results show that the accumulated peak spectrum is indeed a robust representation of pitch harmonics. Subsequently, the accumulated peak spectrum is expressed as a sparse linear combination of a large set of clean peak spectrum exemplars. Gaussian mixture density is used to model noise spectrum peaks. The weights of the linear combination are estimated so as to maximize the likelihood of the accumulated peak spectrum under sparsity constraint. Robust pitch estimation is done based on the sparse weights and the corresponding peak spectrum exemplars. The use of Gaussian mixture model leads to non-convexity of the objective function for sparse weight estimation. By approximation and reformulation, two convex optimization approaches are developed to estimate the weights. Extensive experimental studies are carried out to evaluate performance of the proposed pitch estimation algorithms on a wide variety of noise conditions. It is clearly shown that the proposed methods significantly and consistently outperform the conventional methods, particularly at very low signal-to-noise ratios (e.g., SNR $<-$5 dB).
机译:从语音信号估计音调是语音研究许多领域的基本问题。对于噪声受损的语音,难以可靠地估计音调。本文提出了一种基于鲁棒的时谱表示和稀疏重构的嘈杂语音音高估计方法。我们建议在连续的时间范围内累积频谱峰值。由于语音的谐波结构比噪声频谱变化要慢得多,因此与音高谐波相关的频谱峰值将通过累积而在噪声中脱颖而出。实验结果表明,累积的峰值频谱确实是音高谐波的可靠表示。随后,累积的峰光谱表示为一大组干净的峰光谱示例的稀疏线性组合。高斯混合密度用于建模噪声频谱峰值。估计线性组合的权重,以便在稀疏性约束下最大化累积峰频谱的可能性。基于稀疏权重和相应的峰值频谱示例进行可靠的音高估计。高斯混合模型的使用导致稀疏权重估计的目标函数不具凸性。通过近似和重构,开发了两种凸优化方法来估计权重。进行了广泛的实验研究,以评估所提出的音高估计算法在各种噪声条件下的性能。清楚地表明,所提出的方法显着且持续地优于常规方法,特别是在非常低的信噪比(例如,SNR≤-5dB)下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号