In a maximum a posteriori probability approach to speech recognition stochastic n-gram language models are used for the estimation of a word sequence's a priori probability. In any practical implementation of a large vocabulary speech recognition system the language model acts as a hypotheses filter that has to differ between candidate words with similar acoustic evidence. For that purpose, the combination of word based and class based language models is attractive, because it allows to fall back to the more reliable estimates of the class based model in case of sparse training data. However, class language models can differ between words from the same class only in terms of a priori probability. To improve the discriminative power for words with similar acoustic score, it is therefore useful to put similar sounding words into different classes.
展开▼