Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

Shimada Kazuki; Bando Yoshiaki; Mimura Masato; Itoyama Katsutoshi; Yoshii Kazuyoshi; Kawahara Tatsuya

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

【24h】

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

机译：基于多通道NMF信息波束形成的无监督语音增强技术，用于强噪声自动语音识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take a supervised approach that classifies each time-frequency (TF) bin into noise or speech by training a deep neural network (DNN). The performance of ASR, however, is degraded in an unknown noisy environment. To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF). This enables us to accurately estimate the SCMs of speech and noise not from observed noisy mixtures but from separated speech and noise components. In this paper, we propose online MVDR beamforming by effectively initializing and incrementally updating the parameters of MNMF. Another main contribution is to comprehensively investigate the performances of ASR obtained by various types of spatial filters, i.e., time-invariant and variant versions of MVDR beamformers and those of rank-1 and full-rank multichannel Wiener filters, in combination with MNMF. The experimental results showed that the proposed method outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.

机译：本文介绍了多通道语音增强功能，用于在嘈杂的环境中改善自动语音识别（ASR）。近来，最小方差无失真响应（MVDR）波束成形已得到广泛使用，因为如果给出语音的引导向量和噪声的空间协方差矩阵（SCM），它可以很好地工作。为了估计这种空间信息，常规研究采用一种监督方法，该方法通过训练深度神经网络（DNN）将每个时频（TF）箱分类为噪声或语音。但是，在未知的嘈杂环境中，ASR的性能会降低。为了解决这个问题，我们采取了一种无监督的方法，即通过使用多通道非负矩阵分解（MNMF）将每个TF bin分解为语音和噪声之和。这使我们能够从观察到的嘈杂混合物中准确估计语音和噪声的SCM，而不是从分离的语音和噪声分量中准确估计。在本文中，我们通过有效地初始化和增量更新MNMF的参数来提出在线MVDR波束成形。另一个主要的贡献是全面研究了各种类型的空间滤波器（即MVDR波束形成器的时不变和变体版本以及1级和全秩多通道维纳滤波器）与MNMF相结合所获得的ASR性能。实验结果表明，在与训练数据不匹配的未知环境中，该方法优于基于DNN的最新波束成形方法。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2019年第5期|960-971|共12页
作者
Shimada Kazuki; Bando Yoshiaki; Mimura Masato; Itoyama Katsutoshi; Yoshii Kazuyoshi; Kawahara Tatsuya;
展开▼
作者单位

Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan;

Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan;

Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan;

Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan;

Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan|RIKEN, Ctr Adv Intelligence Project, Tokyo 1030027, Japan;

Kyoto Univ, Grad Sch Informat, Kyoto 6068501, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Noisy speech recognition; speech enhancement; multichannel nonnegative matrix factorization; beamforming;

机译：嘈杂的语音识别;语音增强;多通道非负矩阵分解;波束成形;

相似文献

外文文献
中文文献
专利

1. Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition [J] . Shimada Kazuki, Bando Yoshiaki, Mimura Masato, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第5期

机译：基于多通道NMF的噪声强度自动语音识别的无监督语音增强
2. Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach [J] . Seyed Reza Shahamiri, Siti Salwah Binti Salim Neurocomputing . 2014,第apra10期

机译：基于多网络人工神经网络的实时基于频率的鲁棒性自动语音识别：多视图多学习者方法
3. An FFT-Based Companding Front End for Noise-Robust Automatic Speech Recognition [J] . Bhiksha Raj, Lorenzo Turicchia, Bent Schmidt-Nielsen, EURASIP journal on audio, speech, and music processing . 2007,第1期

机译：基于FFT的压扩前端，用于噪声鲁棒的自动语音识别
4. UNSUPERVISED BEAMFORMING BASED ON MULTICHANNEL NONNEGATIVE MATRIX FACTORIZATION FOR NOISY SPEECH RECOGNITION [C] . Kazuki Shimada, Yoshiaki Bando, Masato Mimura, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：基于多通道非负矩阵因子的无监督波束成形噪声语音识别
5. High-performance automatic speech recognition via enhanced front-end analysis and acoustic modeling. [D] . Gu, Liang. 2001

机译：通过增强的前端分析和声学建模实现高性能的自动语音识别。
6. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech [O] . László Tóth, Ildikó Hoffmann, Gábor Gosztolya, -1

机译：基于语音识别的自发性语音自动检测轻度认知障碍的解决方案
7. Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition [O] . Gemmeke Jort, Virtanen Tuomas, Hurmalainen Antti 2011

机译：基于示例的语音增强及其在鲁棒自动语音识别中的应用

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅