Band-independent categories are investigated for feature estimation in ASR. These categories represent distinct speech-events manifested in frequency-localized temporal patterns of the speech signal. A universal, single estimator is proposed for estimating speech-event posterior probabilities using temporal patterns of critical-band energies for all the bands. The estimated posteriors are used as the input features (referred to as speech-event features) to a back-end recognizer. These features are evaluated on continuous OGI-Digits task. The features are also evaluated on Aurora-2 and Aurora-3 tasks in a Distributed Speech Recognition (DSR) framework. These features are compared with earlier proposed broad-phonetic TRAPs features estimated from temporal patterns using independent estimators in each critical-band.
展开▼