In this paper, we study state clustering using word contexts for speech recognition. In spontaneous speech, poorly articulated words often cause recognition error. To improve the recognition performance, we add two questions used in the phonetical decision tree based state clustering. One is a question about parts of speech, and the other is a question about the position of phones within a word. To apply the question about parts of speech, we classify parts of speech into two classes based on the word's duration estimated by using the corpus of spontaneous speech. After making HMMs for each class, we carry out state clustering using a context decision tree with the questions about the classes. To apply questions about the position of phones within a word, we make HMMs for phones at the beginning of the word, those for phones at the ending of the word, and those for phones at the other positions, separately. Then we carry out state clustering using a context decision tree with questions about phone positions. We carried out speech recognition experiments using CSJ (Corpus of Spontaneous Japanese). In the best case, the word accuracy improved by 2.4 points with the use of the former method, and it improved by 6.1 points with the use of the latter method.
展开▼