首页> 外文期刊>EURASIP journal on advances in signal processing >Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features
【24h】

Front-end technologies for robust ASR in reverberant environments—spectral enhancement-based dereverberation and auditory modulation filterbank features

机译:在混响环境中增强鲁棒ASR的前端技术-基于频谱增强的混响和听觉调制滤波器组功能

获取原文
           

摘要

This paper presents extended techniques aiming at the improvement of automatic speech recognition (ASR) in single-channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. The focus is laid on the development and analysis of ASR front-end technologies covering speech enhancement and feature extraction. Speech enhancement is performed using a joint noise reduction and dereverberation system in the spectral domain based on estimates of the noise and late reverberation power spectral densities (PSDs). To obtain reliable estimates of the PSDs—even in acoustic conditions with positive direct-to-reverberation energy ratios (DRRs)—we adopt the statistical model of the room impulse response explicitly incorporating DRRs, as well in combination with a novel proposed joint estimator for the reverberation time T 60 and the DRR. The feature extraction approach is inspired by processing strategies of the auditory system, where an amplitude modulation filterbank is applied to extract the temporal modulation information. These techniques were shown to improve the REVERB baseline in our previous work. Here, we investigate if similar improvements are obtained when using a state-of-the-art ASR framework, and to what extent the results depend on the specific architecture of the back-end. Apart from conventional Gaussian mixture model (GMM)-hidden Markov model (HMM) back-ends, we consider subspace GMM (SGMM)-HMMs as well as deep neural networks in a hybrid system. The speech enhancement algorithm is found to be helpful in almost all conditions, with the exception of deep learning systems in matched training-test conditions. The auditory feature type improves the baseline for all system architectures. The relative word error rate reduction achieved by combining our front-end techniques with current back-ends is 52.7% on average with the REVERB evaluation test set compared to our original REVERB result.
机译:本文提出了扩展的技术,旨在在REVERB(回响语音增强和识别基准)挑战的背景下改善单通道场景中的自动语音识别(ASR)。重点放在ASR前端技术的开发和分析上,这些技术包括语音增强和特征提取。基于噪声和后期混响功率谱密度(PSD)的估计,在频谱域中使用联合降噪和混响系统执行语音增强。为了获得可靠的PSD估计值(即使在声学条件下具有直接的直接混响能量比(DRR)为正值),我们采用了明确包含DRR的房间脉冲响应的统计模型,并结合了新颖的拟议联合估计器混响时间T 60和DRR。特征提取方法受到听觉系统处理策略的启发,在该策略中,应用了幅度调制滤波器组来提取时间调制信息。在我们以前的工作中,这些技术被证明可以改善REVERB基线。在这里,我们调查在使用最新的ASR框架时是否获得了类似的改进,并且结果在多大程度上取决于后端的特定体系结构。除了传统的高斯混合模型(GMM)-隐马尔可夫模型(HMM)后端,我们还考虑了子空间GMM(SGMM)-HMM以及混合系统中的深层神经网络。发现语音增强算法在几乎所有条件下都是有帮助的,但在匹配的训练测试条件下的深度学习系统除外。听觉特征类型改善了所有系统体系结构的基准。与我们的原始REVERB结果相比,使用REVERB评估测试集将我们的前端技术与当前后端相结合可以平均降低52.7%的相对字错误率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号