首页> 外文期刊>EURASIP journal on advances in signal processing >Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments
【24h】

Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments

机译:去混响,特征转换,判别训练方法和系统组合方法在各种混响环境中的有效性

获取原文
       

摘要

The recently released REverberant Voice Enhancement and Recognition Benchmark (REVERB) challenge includes a reverberant automatic speech recognition (ASR) task. This paper describes our proposed system based on multi-channel speech enhancement preprocessing and state-of-the-art ASR techniques. For preprocessing, we propose a single-channel dereverberation method with reverberation time estimation, which is combined with multichannel beamforming that enhances direct sound compared with the reflected sound. In addition, this paper also focuses on state-of-the-art ASR techniques such as discriminative training of acoustic models including the Gaussian mixture model, subspace Gaussian mixture model, and deep neural networks, as well as various feature transformation techniques. Although, for the REVERB challenge, it is necessary to handle various acoustic environments, a single ASR system tends to be overly tuned for a specific environment, which degrades the performance in the mismatch environments. To overcome this mismatch problem with a single ASR system, we use a system combination approach using multiple ASR systems with different features and different model types because a combination of various systems that have different error patterns is beneficial. In particular, we use our discriminative training technique for system combination that achieves better generalization by making systems complementary with the modified discriminative criteria. Experiments show the effectiveness of these approaches, reaching 6.76 and 18.60 % word error rates on the REVERB simulated and real test sets. These are 68.8 and 61.5 % relative improvements over the baseline.
机译:最近发布的混响语音增强和识别基准(REVERB)挑战包括混响自动语音识别(ASR)任务。本文介绍了我们基于多通道语音增强预处理和最新ASR技术提出的系统。对于预处理,我们提出了一种带有混响时间估计的单通道去混响方法,该方法与多通道波束成形相结合,与反射声相比,增强了直接声。此外,本文还重点介绍了最新的ASR技术,例如对声学模型的判别训练,包括高斯混合模型,子空间高斯混合模型和深度神经网络,以及各种特征转换技术。尽管对于REVERB挑战,必须处理各种声学环境,但单个ASR系统往往针对特定环境进行了过度调音,这会降低不匹配环境的性能。为了克服单个ASR系统的不匹配问题,我们使用系统组合方法,使用具有不同功能和不同模型类型的多个ASR系统,因为将具有不同错误模式的各种系统组合在一起是有益的。特别是,我们将判别训练技术用于系统组合,该方法通过使系统与修改后的判别准则互补来实现更好的概括性。实验证明了这些方法的有效性,在REVERB模拟和真实测试集上达到了6.76和18.60%的字错误率。与基准相比,分别有68.8%和61.5%的相对改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号