Prevalent speaker recognition methods use only spectral-envelope based features such as MFCC, ignoring the rich speaker identity information contained in the temporal-spectral dynamics of the entire speech signal. We propose a new feature for speaker recognition based on sinusoidal representation of speech called compressed spectral dynamics (Sinogram-CSD), which effectively captures such spectral dynamics and the inherent speaker identity. The discriminative power of CSD allows classification to remain simple. The proposed CSD-MSRI method uses a simple nearest neighbor classifier to deliver performance competitive to conventional MFCC+DTW based text-dependent speaker recognition methods at significantly lower complexity.
展开▼