In this paper, we expand our previously proposed HMM-SM-based speech recognition system [1, 2, 3] to a connected digit recognition task by exploring the effect of normalizing the acoustic qualities of the monophones in an utterance and compare it with a number of HMM-based systems with utterance-level normalization, word-level normalization, monophone-level normalization and without normalization. In the proposed HMM-SM-based system, an HMM-based classifier classifies the N-best hypotheses (word candidates), and then an SM (Subspace Method)-based verifier tests the hypotheses after applying the monophone score normalization. Experimental results performed on a connected digit recognition task showed that the word correct rate and the word accuracy rate were significantly improved by the proposed method from 96.3% to 98.7% and from 95.7% to 98.2%, respectively, compared with the convenient HMM-based classifier with utterance-level normalization. The proposed method also showed high performance over the other HMM-based systems that we have compared.
展开▼