首页> 美国卫生研究院文献>Frontiers in Neuroscience >Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space
【2h】

Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space

机译:仿生希尔伯特空间上基于数字启发的基于穗的孤立数字自动语音识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.
机译:本文提出了一种新颖的实时动态框架,用于使用尖峰来量化口语中的时间序列结构。使用受生物启发的泄漏集成与发射(LIF)尖峰发生器,音频信号被转换为多通道尖峰序列。这些尖峰序列被映射到无限维的功能空间中,即使用点处理内核的重现内核希尔伯特空间(RKHS),其中状态空间模型使用梯度下降学习来学习多维尖峰输入的动力学。这个内核化的递归系统非常简约,并且当利用音素序列的全部上下文进行有区别的训练时,通过对其内部状态的反馈来获得必要的存储深度。在RKHS中使用状态空间轨迹对非线性动力学建模的主要优点是,它对外源输入与其内部状态之间的关系没有任何限制。我们可以自由选择具有适当内核的输入表示形式,并且更改内核不会影响系统或学习算法。此外,我们表明,这种新颖的框架可以胜过传统的隐马尔可夫模型(HMM)语音处理以及基于尖峰神经网络(SNN)的神经形态实现,从而产生准确且超低功耗的单词查找器。作为概念验证,我们使用基准TI-46数字语料库演示了其用于隔离词自动语音识别(ASR)或关键字查找的功能。与使用不带时间导数的梅尔频率倒谱系数(MFCC)前端的HMM相比,我们的MFCC-KAARMA提供了更高的性能。对于峰值列车的前端,峰值KAARMA的性能也优于最新的SNN解决方案。此外,与MFCC相比,尖峰序列在某些低信噪比(SNR)方案中提供了增强的噪声鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号