首页> 外文期刊>Computer speech and language >A stereophonic acoustic signal extraction scheme for noisy and reverberant environments
【24h】

A stereophonic acoustic signal extraction scheme for noisy and reverberant environments

机译:用于嘈杂和混响环境的立体声声音信号提取方案

获取原文
获取原文并翻译 | 示例
           

摘要

In this contribution, a novel two-channel acoustic front-end for robust automatic speech recognition in adverse acoustic environments with nonstationary interference and reverberation is proposed. From a MISO system perspective, a statistically optimum source signal extraction scheme based on the multichannel Wiener filter (MWF) is discussed for application in noisy and under-determined scenarios. For free-field and diffuse noise conditions, this optimum scheme reduces to a Delay & Sum beamformer followed by a single-channel Wiener postfilter. Scenarios with multiple simultaneously interfering sources and background noise are usually modeled by a diffuse noise field. However, in reality, the free-field assumption is very weak because of the reverberant nature of acoustic environments. Therefore, we propose to estimate this simplified MWF solution in each frequency bin separately to cope with reverberation. We show that this approach can very efficiently be realized by the combination of a blocking matrix based on semi-blind source separation ('directional BSS'), which provides a continuously updated reference of all undesired noise and interference components separated from the desired source and its reflections, and a single-channel Wiener postfilter. Moreover, it is shown, how the obtained reference signal of all undesired components can efficiently be used to realize the Wiener postfilter, and at the same time, generalizes well-known postfilter realizations. The proposed front-end and its integration into an automatic speech recognition (ASR) system are analyzed and evaluated in noisy living-room-like environments according to the PASCAL CHiME challenge. A comparison to a simplified front-end based on a free-field assumption shows that the introduced system substantially improves the speech quality and the recognition performance under the considered adverse conditions.
机译:在此贡献中,提出了一种新颖的两通道声学前端,用于在不利的声学环境中具有非平稳干扰和混响的鲁棒自动语音识别。从MISO系统的角度出发,讨论了基于多通道维纳滤波器(MWF)的统计上最优的源信号提取方案,该方案可用于嘈杂和不确定的情况。对于自由场和扩散噪声条件,此最佳方案可简化为延迟与求和波束形成器,后跟单通道维纳后置滤波器。具有多个同时干扰源和背景噪声的场景通常由扩散噪声场建模。但是,实际上,由于声学环境的混响性质,自由场假设非常弱。因此,我们建议分别估计每个频率仓中的这种简化的MWF解决方案,以应对混响。我们表明,通过基于半盲源分离(“定向BSS”)的分块矩阵的组合,可以非常有效地实现此方法,该方法为从所需源和源分离出的所有不希望有的噪声和干扰分量提供了连续更新的参考它的反射和一个单通道维纳后置滤波器。此外,示出了如何将获得的所有不希望有的分量的参考信号有效地用于实现维纳后置滤波器,并且同时概括了公知的后置滤波器实现。根据PASCAL CHiME的挑战,在嘈杂的客厅环境中分析并评估了建议的前端及其与自动语音识别(ASR)系统的集成。与基于自由场假设的简化前端进行的比较表明,在考虑到不利条件的情况下,引入的系统显着提高了语音质量和识别性能。

著录项

  • 来源
    《Computer speech and language》 |2013年第3期|726-745|共20页
  • 作者单位

    Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr, 7, 91058 Erlangen, Germany;

    Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr, 7, 91058 Erlangen, Germany;

    Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr, 7, 91058 Erlangen, Germany;

    Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr, 7, 91058 Erlangen, Germany;

    Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr, 7, 91058 Erlangen, Germany;

    Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr, 7, 91058 Erlangen, Germany;

    Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstr, 7, 91058 Erlangen, Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    blind source extraction; speech enhancement; robust automatic speech recognition; PASCAL CHiME challenge;

    机译:盲源提取;语音增强;强大的自动语音识别;PASCAL CHiME挑战;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号