首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition
【24h】

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

机译:基于多通道NMF的噪声强度自动语音识别的无监督语音增强

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take a supervised approach that classifies each time-frequency (TF) bin into noise or speech by training a deep neural network (DNN). The performance of ASR, however, is degraded in an unknown noisy environment. To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF). This enables us to accurately estimate the SCMs of speech and noise not from observed noisy mixtures but from separated speech and noise components. In this paper, we propose online MVDR beamforming by effectively initializing and incrementally updating the parameters of MNMF. Another main contribution is to comprehensively investigate the performances of ASR obtained by various types of spatial filters, i.e., time-invariant and variant versions of MVDR beamformers and those of rank-1 and full-rank multichannel Wiener filters, in combination with MNMF. The experimental results showed that the proposed method outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.
机译:本文介绍了改进嘈杂环境中的自动语音识别(ASR)的多声道语音增强。最近,广泛使用的最小方差失真响应(MVDR)波束形成,因为如果给出了噪声的言语和空间协方差矩阵(SCM)的转向载体,则运用良好。为了估计这种空间信息,传统研究采用监督方法,通过训练深神经网络(DNN)将每个时频(TF)箱分类为噪声或语音。然而,ASR的性能在未知的嘈杂环境中劣化。为了解决这个问题,我们采用无监督的方法,通过使用多通道非负矩阵分解(MNMF)将每个TF箱分解为语音和噪声总和。这使我们能够准确地估计来自观察到的噪声混合物的语音和噪声的SCM,而是从分离的语音和噪声分量。在本文中,我们通过有效地初始化和逐步更新MNMF的参数来提出在线MVDR波束成形。另一个主要贡献是全面调查通过各种类型的空间滤波器获得的ASR的性能,即MVDR波束形成器的各种类型和变体版本和秩-1和全秩多通道维纳滤波器的各种类型,以及全级多声道维纳滤波器与MNMF组合。实验结果表明,该方法在不匹配训练数据的未知环境中表现出基于最先进的DNN的波束形成方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号