首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR
【24h】

Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR

机译:基于空间先验的复杂高斯混合模型的在线MVDR波束形成器用于鲁棒ASR

获取原文
获取原文并翻译 | 示例

摘要

This paper considers acoustic beamforming for noise robust automatic speech recognition. A beamformer attenuates background noise by enhancing sound components coming from a direction specified by a steering vector. Hence, accurate steering vector estimation is paramount for successful noise reduction. Recently, time–frequency masking has been proposed to estimate the steering vectors that are used for a beamformer. In particular, we have developed a new form of this approach, which uses a speech spectral model based on a complex Gaussian mixture model (CGMM) to estimate the time–frequency masks needed for steering vector estimation, and extended the CGMM-based beamformer to an online speech enhancement scenario. Our previous experiments showed that the proposed CGMM-based approach outperforms a recently proposed mask estimator based on a Watson mixture model and the baseline speech enhancement system of the CHiME-3 challenge. This paper provides additional experimental results for our online processing, which achieves performance comparable to that of batch processing with a suitable block-batch size. This online version reduces the CHiME-3 word error rate (WER) on the evaluation set from 8.37% to 8.06%. Moreover, in this paper, we introduce a probabilistic prior distribution for a spatial correlation matrix (a CGMM parameter), which enables more stable steering vector estimation in the presence of interfering speakers. In practice, the performance of the proposed online beamformer degrades with observations that contain only noise or/and interference because of the failure of the CGMM parameter estimation. The introduced spatial prior enables the target speaker's parameter to avoid overfitting to noise or/and interference. Experimental results show that the spatial prior reduces the WER from 38.4% to 29.2% in a conversation recognition task compared with the CGMM-based approach without the prior, and outperforms a conventional online speech enhancement approach.
机译:本文考虑了声束成形技术对噪声的鲁棒性自动语音识别。波束形成器通过增强来自转向矢量指定方向的声音分量来衰减背景噪声。因此,准确的转向矢量估计对于成功降低噪声至关重要。近来,已经提出了时频掩蔽来估计用于波束形成器的操纵向量。特别是,我们开发了这种方法的新形式,它使用基于复杂高斯混合模型(CGMM)的语音频谱模型来估计转向矢量估计所需的时频掩模,并将基于CGMM的波束形成器扩展到在线语音增强方案。我们之前的实验表明,基于CGMM的提议方法优于基于Watson混合模型和CHiME-3挑战的基线语音增强系统的最新提议的掩膜估计器。本文为我们的在线处理提供了额外的实验结果,以合适的分批大小实现了与批处理相当的性能。该在线版本将评估集上的CHiME-3字错误率(WER)从8.37%降低到8.06%。此外,在本文中,我们为空间相关矩阵(CGMM参数)引入了概率先验分布,这使得在存在干扰扬声器的情况下,能够更稳定地估计转向矢量。实际上,由于CGMM参数估计的失败,所提出的在线波束形成器的性能会因仅包含噪声或干扰的观测值而降低。引入的空间先验使得目标说话者的参数能够避免过度拟合噪声或/和干扰。实验结果表明,与不使用先验的基于CGMM的方法相比,在对话识别任务中,空间先验将WER从38.4%降低到29.2%,并且优于传统的在线语音增强方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号