首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition
【24h】

Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition

机译:用于噪声鲁棒语音识别的稀疏听觉再现内核(SPARK)功能

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present a novel speech feature extraction algorithm based on a hierarchical combination of auditory similarity and pooling functions. The computationally efficient features known as “Sparse Auditory Reproducing Kernel” (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (“MAX” operation). In this paper, we describe the effect of different hyper-parameters and kernel functions on the performance of a SPARK based speech recognizer. Experimental results based on the standard AURORA2 dataset demonstrate that the SPARK based speech recognizer delivers consistent improvements in word-accuracy when compared with a baseline speech recognizer trained using the standard ETSI STQ WI008 DSR features.
机译:在本文中,我们提出了一种基于听觉相似度和合并功能的分层组合的新颖语音特征提取算法。在假设语音信号中的噪声鲁棒信息嵌入到由过度完成,非线性和时间跨度覆盖的再现内核希尔伯特空间(RKHS)中的假设下,提取了称为“稀疏听觉再现内核”(SPARK)系数的计算有效特征。移位的伽马通基函数。特征提取算法首先涉及计算语音信号和时移的伽马通函数之间基于核的相似度,然后使用简单的合并技术(“ MAX”操作)对特征进行修剪。在本文中,我们描述了不同的超参数和内核函数对基于SPARK的语音识别器性能的影响。基于标准AURORA2数据集的实验结果表明,与使用标准ETSI STQ WI008 DSR功能训练的基线语音识别器相比,基于SPARK的语音识别器在字词准确性方面具有一致的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号