Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

Feifei Xiong; Bernd T. Meyer; Niko Moritz; Robert Rehr; J#246; rn Anem#252; ller; Timo Gerkmann; Simon Doclo; Stefan Goetze

首页> 外文期刊>EURASIP journal on advances in signal processing >Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

【24h】

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

机译：在混响环境中增强鲁棒ASR的前端技术-基于频谱增强的混响和听觉调制滤波器组功能

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents extended techniques aiming at the improvement of automatic speech recognition (ASR) in single-channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. The focus is laid on the development and analysis of ASR front-end technologies covering speech enhancement and feature extraction. Speech enhancement is performed using a joint noise reduction and dereverberation system in the spectral domain based on estimates of the noise and late reverberation power spectral densities (PSDs). To obtain reliable estimates of the PSDs—even in acoustic conditions with positive direct-to-reverberation energy ratios (DRRs)—we adopt the statistical model of the room impulse response explicitly incorporating DRRs, as well in combination with a novel proposed joint estimator for the reverberation time T 60 and the DRR. The feature extraction approach is inspired by processing strategies of the auditory system, where an amplitude modulation filterbank is applied to extract the temporal modulation information. These techniques were shown to improve the REVERB baseline in our previous work. Here, we investigate if similar improvements are obtained when using a state-of-the-art ASR framework, and to what extent the results depend on the specific architecture of the back-end. Apart from conventional Gaussian mixture model (GMM)-hidden Markov model (HMM) back-ends, we consider subspace GMM (SGMM)-HMMs as well as deep neural networks in a hybrid system. The speech enhancement algorithm is found to be helpful in almost all conditions, with the exception of deep learning systems in matched training-test conditions. The auditory feature type improves the baseline for all system architectures. The relative word error rate reduction achieved by combining our front-end techniques with current back-ends is 52.7% on average with the REVERB evaluation test set compared to our original REVERB result.

机译：本文提出了扩展的技术，旨在在REVERB（回响语音增强和识别基准）挑战的背景下改善单通道场景中的自动语音识别（ASR）。重点放在ASR前端技术的开发和分析上，这些技术包括语音增强和特征提取。基于噪声和后期混响功率谱密度（PSD）的估计，在频谱域中使用联合降噪和混响系统执行语音增强。为了获得可靠的PSD估计值（即使在声学条件下具有直接的直接混响能量比（DRR）为正值），我们采用了明确包含DRR的房间脉冲响应的统计模型，并结合了新颖的拟议联合估计器混响时间T 60和DRR。特征提取方法受到听觉系统处理策略的启发，在该策略中，应用了幅度调制滤波器组来提取时间调制信息。在我们以前的工作中，这些技术被证明可以改善REVERB基线。在这里，我们调查在使用最新的ASR框架时是否获得了类似的改进，并且结果在多大程度上取决于后端的特定体系结构。除了传统的高斯混合模型（GMM）-隐马尔可夫模型（HMM）后端，我们还考虑了子空间GMM（SGMM）-HMM以及混合系统中的深层神经网络。发现语音增强算法在几乎所有条件下都是有帮助的，但在匹配的训练测试条件下的深度学习系统除外。听觉特征类型改善了所有系统体系结构的基准。与我们的原始REVERB结果相比，使用REVERB评估测试集将我们的前端技术与当前后端相结合可以平均降低52.7％的相对字错误率。

著录项

来源
《EURASIP journal on advances in signal processing》 |2015年第1期|共页
作者
Feifei Xiong; Bernd T. Meyer; Niko Moritz; Robert Rehr; J#246; rn Anem#252; ller; Timo Gerkmann; Simon Doclo; Stefan Goetze;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类通信;
关键词
Automatic speech recognitionDereverberationAuditory modulation filterbankDeep neural networkREVERB challenge;

机译：自动语音识别去混响听觉调制滤波器组深层神经网络REVERB挑战;

相似文献

外文文献
中文文献
专利

1. PS-ZCPA Based Feature Extraction with Auditory Masking, Modulation Enhancement and Noise Reduction for Robust ASR [J] . Muhammad GHULAM, Takashi FUKUDA, Kouichi KATSURADA, IEICE Transactions on Information and Systems . 2006,第3期

机译：基于PS-ZCPA的特征提取，具有听觉掩蔽，调制增强和降噪功能，可实现可靠的ASR
2. Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique [J] . Md Jahangir Alam, Patrick Kenny, Douglas OShaughnessy Digital Signal Processing . 2014,第Null期

机译：基于非对称电平相关听觉滤波器组和子带频谱增强技术的鲁棒特征提取
3. Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments [J] . Yuuki Tachioka, Tomohiro Narita, Shinji Watanabe EURASIP journal on advances in signal processing . 2015,第1期

机译：去混响，特征转换，判别训练方法和系统组合方法在各种混响环境中的有效性
4. An Auditory Based Modulation Spectral Feature for Reverberant Speech Recognition [C] . HariKrishna Maganti, Marco Matassoni Annual conference of the International Speech Communication Association;INTERSPEECH 2010 . 2011

机译：基于听觉的调制频谱特征用于回响语音识别
5. Coding of complex temporal and spectral features in the auditory cortex of awake primates. [D] . Barbour, Dennis Louis. 2003

机译：清醒的灵长类动物听觉皮层中复杂的时间和频谱特征的编码。
6. Physiological evidence for auditory modulation filterbanks: Cortical responses to concurrent modulations [O] . Juanjuan Xiang, David Poeppel, Jonathan Z. Simon -1

机译：听觉调制滤波器组的生理证据：皮质对并行调制的响应。
7. Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features [O] . Feifei Xiong, Bernd T. Meyer, Niko Moritz, 2015

机译：在混响环境中增强ASR的前端技术-基于频谱增强的混响和听觉调制滤波器组功能

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

摘要

著录项

相似文献

相关主题

期刊订阅