An instantaneous speaker adaptation method is proposed that uses N-best decoding for continuous mixture-density hidden Markov model-based speech recognition systems. An N-best paradigm of multiple-pass search strategies is used that makes this method effective even for speakers whose decodings using speaker-independent models are error-prone. To cope with an insufficient amount of data, our method uses constrained maximum a posteriori estimation, in which the parameter vector space is clustered, and a mixture-mean bias is estimated for each cluster. Moreover, to maintain continuity between clusters, a bias for each mixture-mean is calculated as the weighted sum of the estimated biases. Performance evaluation using connected-digit (four-digit strings) recognition experiments performed over actual telephone lines showed more than a 20% reduction in the error rates, even for speakers whose decodings using speaker-independent models were error-prone.
展开▼