A set of techniques for configuring a speech recognition system to a particular user are described in the context of voice label recognition over the public switched telephone network. User-configurable vocabularies are provided through automatic acoustic baseform determination based on an inventory of speaker-independent subword acoustic units. The tendency of input utterances to contain out-of-vocabulary or non-speech information is accounted for by using likelihood ratio-based utterance verification procedures. The mismatch between a given user's utterances and the hidden Markov model is accounted for by using a frequency-warping approach to speaker normalization. The performance of these techniques was evaluated on utterances taken from a trial version of a voice label recognition service.
展开▼