In this work, the application of across-word phoneme models during large vocabulary continuous speech recognition is studied. A recognition system will be developed which allows for the training of high performance across-word phoneme models, the efficient application of these across-word phoneme models in combination with long-span language models in one single search pass, and the construction of word graphs. In contrast to within-word phoneme models which consider the context dependency of the phonemes representing the words in the vocabulary only within the words and use a reduced phonetic context at word boundaries, across-word phoneme models consider the context dependency of the phonemes also across word boundaries. As it is known for many years, this results in significant word error rate improvements but also in a considerably higher computational effort. Today, across-word phoneme models are applied by a number of groups. However, the published descriptions of these recognition systems are often quite general, many implementation details needed for the successful application of across-word phoneme models are usually missing. In this work, all details about the transformation of a baseline within-word model system into an across-word model system will be discussed. It will be analyzed in detail how the introduction of across-word phoneme models affects word error rate, runtime and memory requirements of the recognition system. First, the across-word model paradigm will be integrated into the very general Bayes' decision rule which is the basis of speech recognition. Taking all model assumptions and approximations needed for the application of across-word models into account, a specialized decision rule will be derived. Based on this specialized decision rule the across-word model system will be developed. Compared to the baseline within-word model system, the introduction of across-word phoneme models results in a significantly more complex search network. The efficient application of across-word phoneme models in combination with long-span language models in one single search pass requires a careful design of the search network as well as of the search algorithm which will be discussed in detail. In contrast to the baseline within-word model training, the phonetic representation of the training utterances is not unique anymore if across-word models are to be trained. Furthermore, the parameterization of the baseline within-word model training should be modified in order to obtain optimally performing across-word models. Finally, the introduction of across-word models affects also the construction of word graphs. In order to optimize the runtime of the developed across-word model search further, several acceleration methods will be applied which have partly already been discussed for within-word model systems in the literature. In addition, methods for further increasing the accuracy of across-word models will be studied which are based on a refined pronunciation modeling. The developed across-word system will be finally evaluated on three different speech corpora by comparing the recognition results of this system to the recognition results of the baseline within-word model system. On two of the corpora, these results will also be compared to the results of other research groups, as they are published in the literature. It will be seen that the developed recognition system produces state-of-the-art word error rates.
展开▼