首页> 外文期刊>Computer speech and language >Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
【24h】

Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition

机译:噪声鲁棒自动语音识别的多通道语音增强和幅度调制分析

获取原文
获取原文并翻译 | 示例
           

摘要

The paper describes a system for automatic speech recognition (ASR) that is benchmarked with data of the 3rd CHiME challenge, a dataset comprising distant microphone recordings of noisy acoustic scenes in public environments. The proposed ASR system employs various methods to increase recognition accuracy and noise robustness. Two different multi-channel speech enhancement techniques are used to eliminate interfering sounds in the audio stream. One speech enhancement method aims at separating the target speaker's voice from background sources based on non-negative matrix factorization (NMF) using variational Bayesian (VB) inference to estimate NMF parameters. The second technique is based on a time-varying minimum variance distortionless response (MVDR) beamformer that uses spatial information to suppress sound signals not arriving from a desired direction. Prior to speech enhancement, a microphone channel failure detector is applied that is based on cross-comparing channels using a modulation-spectral representation of the speech signal. ASR feature extraction employs the amplitude modulation filter bank (AMFB) that implicates prior information of speech to analyze its temporal dynamics. AMFBs outperform the commonly used frame splicing technique of filter bank features in conjunction with a deep neural network (DNN) based ASR system, which denotes an equivalent data-driven approach to extract modulation-spectral information. In addition, features are speaker adapted, a recurrent neural network (RNN) is employed for language modeling, and hypotheses of different ASR systems are combined to further enhance the recognition accuracy. The proposed ASR system achieves an absolute word error rate (WER) of 5.67% on the real evaluation test data, which is 0.16% lower compared to the best score reported within the 3rd CHiME challenge.
机译:本文介绍了一种自动语音识别(ASR)系统,该系统以第三次CHiME挑战的数据为基准,该数据集包括在公共环境中嘈杂的声学场景的远距离麦克风录音。提出的ASR系统采用各种方法来提高识别精度和噪声鲁棒性。两种不同的多通道语音增强技术用于消除音频流中的干扰声音。一种语音增强方法旨在基于非负矩阵分解(NMF),使用变分贝叶斯(VB)推断来估计NMF参数,从而将目标说话者的语音与背景源分离。第二种技术基于时变最小方差无失真响应(MVDR)波束形成器,该波束形成器使用空间信息来抑制未从所需方向到达的声音信号。在语音增强之前,应用麦克风通道故障检测器,该检测器基于使用语音信号的调制频谱表示的交叉比较通道。 ASR特征提取采用了调幅滤波器组(AMFB),它包含语音的先验信息以分析其时间动态。结合基于深度神经网络(DNN)的ASR系统,AMFB的性能优于常用的滤波器组特征的帧拼接技术,这是一种等效的数据驱动方法,可提取调制频谱信息。此外,还具有针对说话人的功能,使用递归神经网络(RNN)进行语言建模,并结合了不同ASR系统的假设,以进一步提高识别准确性。拟议的ASR系统在真实评估测试数据上的绝对单词错误率(WER)为5.67%,比第三次CHiME挑战中报告的最佳分数低0.16%。

著录项

  • 来源
    《Computer speech and language》 |2017年第11期|558-573|共16页
  • 作者单位

    Fraunhofer IDMT, Project Group for Hearing, Speech, and Audio Technology, Marie-Curie-Str. 2, Oldenburg, Germany,Cluster of Excellence, Hearing4all, Oldenburg, Germany;

    Hörtech gGmbH, Marie-Curie-Str. 2, Oldenburg, Germany,Cluster of Excellence, Hearing4all, Oldenburg, Germany;

    University of Oldenburg, Medizinische Physik, Carl-von-Ossietzky-Str. 9-11, Oldenburg, Germany,Cluster of Excellence, Hearing4all, Oldenburg, Germany;

    Fraunhofer IDMT, Project Group for Hearing, Speech, and Audio Technology, Marie-Curie-Str. 2, Oldenburg, Germany,Cluster of Excellence, Hearing4all, Oldenburg, Germany;

    Fraunhofer IDMT, Project Group for Hearing, Speech, and Audio Technology, Marie-Curie-Str. 2, Oldenburg, Germany,Hörtech gGmbH, Marie-Curie-Str. 2, Oldenburg, Germany,University of Oldenburg, Medizinische Physik, Carl-von-Ossietzky-Str. 9-11, Oldenburg, Germany,Cluster of Excellence, Hearing4all, Oldenburg, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Amplitude modulation filter bank; CHiME; Feature extraction; Modulation frequency analysis; Non-negative matrix factorization; Speech enhancement;

    机译:调幅滤波器组;CHiME;特征提取;调制频率分析;非负矩阵分解;语音增强;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号