首页> 外文期刊>Computer speech and language >A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources
【24h】

A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources

机译:在多种情况下,以听觉为灵感的远距离麦克风语音识别方法

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses the problem of speech recognition in reverberant multisource noise conditions using distant binaural microphones. Our scheme employs a two-stage fragment decoding approach inspired by Bregman's account of auditory scene analysis, in which innate primitive grouping 'rules' are balanced by the role of learnt schema-driven processes. First, the acoustic mixture is split into local time-frequency fragments of individual sound sources using signal-level primitive grouping cues. Second, statistical models are employed to select fragments belonging to the sound source of interest, and the hypothesis-driven stage simultaneously searches for the most probable speech/background segmentation and the corresponding acoustic model state sequence. The paper reports recent advances in combining adaptive noise floor modelling and binaural localisation cues within this framework. By integrating signal-level grouping cues with acoustic models of the target sound source in a probabilistic framework, the system is able to simultaneously separate and recognise the sound of interest from the mixture, and derive significant recognition performance benefits from different grouping cue estimates despite their inherent unreliability in noisy conditions. Finally, the paper will show that missing data imputation can be applied via fragment decoding to allow reconstruction of a clean spectrogram that can be further processed and used as input to conventional ASR systems. The best performing system achieves an average keyword recognition accuracy of 85.83% on the PASCAL CHiME Challenge task.
机译:本文解决了使用远距离双耳麦克风在混响多源噪声条件下的语音识别问题。我们的方案采用了两阶段片段解码方法,该方法受Bregman的听觉场景分析的启发,其中先天原始分组“规则”通过学习的模式驱动过程的作用来平衡。首先,使用信号级原始分组提示将声音混合分解为单个声源的本地时频片段。其次,采用统计模型来选择属于感兴趣声源的片段,并且假设驱动阶段同时搜索最可能的语音/背景分割和相应的声学模型状态序列。本文报道了在此框架内结合自适应底噪建模和双耳定位提示的最新进展。通过在概率框架中将信号级分组提示与目标声源的声学模型集成在一起,该系统能够同时从混合物中分离并识别出感兴趣的声音,并从不同的分组提示估计中获得了明显的识别性能优势,尽管它们嘈杂条件下固有的不可靠性。最后,本文将表明可以通过片段解码来应用丢失的数据插补,以允许重建干净的频谱图,可以对其进行进一步处理并将其用作常规ASR系统的输入。效果最佳的系统在PASCAL CHiME Challenge任务上的平均关键字识别准确率达到85.83%。

著录项

  • 来源
    《Computer speech and language》 |2013年第3期|820-836|共17页
  • 作者单位

    Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield SI 4DP, UK,MRC Institute of Hearing Research, University Park, Nottingham NG7 2RD, UK;

    Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield SI 4DP, UK;

    Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield SI 4DP, UK;

    Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield SI 4DP, UK;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    distant-microphone speech recognition; auditory scene analysis; binaural localisation; noise robustness;

    机译:远传麦克风语音识别;听觉场景分析;双耳定位噪声鲁棒性;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号