首页> 外文期刊>Neural computation >Noise-Robust Speech Recognition Through Auditory Feature Detection and Spike Sequence Decoding
【24h】

Noise-Robust Speech Recognition Through Auditory Feature Detection and Spike Sequence Decoding

机译:通过听觉特征检测和尖峰序列解码的鲁棒语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences—one using a hidden Markov model–based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
机译:嘈杂条件下的语音识别是计算机系统面临的主要挑战,但人脑会常规且准确地执行语音识别。受神经科学启发的自动语音识别(ASR)系统有可能弥合人与机器之间的性能差距。我们提出了一种用于噪声健壮的孤立单词识别的系统,该系统可通过解码来自一系列模拟听觉特征检测神经元的尖峰序列来工作。训练每个神经元以选择性地响应从模拟听觉神经对语音的反应得出的短暂的光谱时态或特征。神经种群通过其尖峰序列传达声音的时间依赖性结构。我们比较了两种解码尖峰序列的方法-一种使用基于隐马尔可夫模型的识别器,另一种使用基于模板的新颖识别方案。在后一种情况下,通过使用基于最长共同子序列长度的相似性度量,可以通过将单词的尖峰序列与从纯净训练数据中获得的模板序列进行比较来识别单词。使用AURORA-2数据库中的孤立语音数字,我们表明,在低信噪比的情况下,我们的组合系统优于最新的鲁棒语音识别器。与传统的语音识别方法相比,基于尖峰的编码方案和基于模板的解码都可以提高噪声鲁棒性。我们的系统突出了基于尖峰的声学编码的潜在优势,并为稳健的ASR开发提供了具有生物学动机的框架。

著录项

  • 来源
    《Neural computation》 |2014年第3期|523-556|共34页
  • 作者单位

    Department of Physics and Center for Neural Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A. pbs130@psu.edu;

    Department of Physics and Center for Neural Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A. djin@phys.psu.edu;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号