首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field
【24h】

Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field

机译:基于分数的ML估计和CRB,用于在空间同质噪声场中的混响,语音和噪声PSDS

获取原文
获取原文并翻译 | 示例

摘要

Hands-free speech systems are subject to performance degradation due to reverberation and noise. Common methods for enhancing reverberant and noisy speech require the knowledge of the speech, reverberation and noise power spectral densities (PSDs). Most literature on this topic assumes that the noise power spectral density (PSD) matrix is known. However, in many practical acoustic scenarios, the noise PSD is unknown and should be estimated along with the speech and the reverberation PSDs. In this article, the noise is modeled as a spatially homogeneous sound field, with an unknown time-varying PSD multiplied by a known time-invariant spatial coherence matrix. We derive two maximum likelihood estimators (MLEs) for the various PSDs, including the noise: The first is a non-blocking-based estimator, that jointly estimates the PSDs of the speech, reverberation and noise components. The second MLE is a blocking-based estimator, that blocks the speech signal and estimates the reverberation and noise PSDs. Since a closed-form solution does not exist, both estimators iteratively maximize the likelihood using the Fisher scoring method. In order to compare both methods, the corresponding Cramér-Rao Bounds (CRBs) are derived. For both the reverberation and the noise PSDs, it is shown that the non-blocking-based CRB is lower than the blocking-based CRB. Performance evaluation using both simulated and real reverberant and noisy signals, shows that the proposed estimators outperform competing estimators, and greatly reduce the effect of reverberation and noise.
机译:由于混响和噪音,免提语音系统受到性能下降的影响。增强混响和嘈杂言论的常用方法需要语音,混响和噪声功率谱密度(PSDS)的知识。大多数文献本主题都假定噪声功率谱密度(PSD)矩阵是已知的。然而,在许多实际的声学场景中,噪声PSD未知,应与语音和混响PSD一起估计。在本文中,噪声被建模为空间均匀的声场,其具有未知的时变PSD乘以已知的时间不变空间相干矩阵。我们推出了两个最大似然估计器(MLES),包括噪声:第一个是基于非阻塞的估计器,共同估计语音,混响和噪声分量的PSD。第二MLE是基于阻塞的估计器,其阻止语音信号并估计混响和噪声PSD。由于不存在闭合方案,因此两个估计器迭代地利用Fisher评分方法最大化的可能性。为了比较这两种方法,导出了相应的Cramér-rao界(CRB)。对于混响和噪声PSD,示出了基于非阻塞的CRB低于基于阻塞的CRB。使用模拟和真正的混响和噪声信号的性能评估表明,所提出的估计器优于竞争估算器,大大降低了混响和噪音的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号