This paper presents an investigation of non-uniform time sampling methods for spectral/temporal feature extraction for use in automatic speech recognition. In most current methods for signal modeling of speech information. "dynamic" features are determined from frame-based parameters using a fixed time sampling, i.e. fixed block length and fixed block spacing. This work explores new methods in which block length and or block spacing are variable. Three methods are suggested and each was tested with the TIMIT database using a standard HMM recognizer. Phone recognition experiments were conducted using the standard 39 phone set. The methods were also evaluated with various HMM model complexities. Experimental results indicated that none of the proposed non-uniform feature time sampling methods perform significantly better than fixed time sampling methos. However. The best results obtained with the front end are comparable to those obtained with current state-of-the-art systems. Also the performance of our monophone system surpasses that of most reported context-dependent monophone systems.
展开▼