The speech feature extraction algorithm is based on a hierarchical combination of auditory similarity and pooling functions. Computationally efficient features referred to as “Sparse Auditory Reproducing Kernel” (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a reproducing kernel Hilbert space (RKHS) spanned by overcomplete, nonlinear, and time-shifted gammatone basis functions. The feature extraction algorithm first involves computing kernel based similarity between the speech signal and the time-shifted gammatone functions, followed by feature pruning using a simple pooling technique (“MAX” operation). Different hyper-parameters and kernel functions may be used to enhance the performance of a SPARK based speech recognizer.
展开▼