A method is presented for speeding up the performance of the HMM based speech recognition system where the states are modeled by a large number of Gaussian kernels. The emission probabilities of the states are usually dominated by the nearest Gaussians to the input vector. The speedup is gained without deteriorating the recognition accuracy by concentrating on these kernels in the reduced K-best-kernel search. In this work, the time information of the input is encoded to the connections of the kernels. The search for the dominating kernels is then performed along the kernel connections which model the trajectories of the speech in the feature space. In the experiments, speaker-dependent speech recognizers were trained for ten speakers. The number of distance computations between feature vectors and kernel mean vectors was reduced 75% without increasing the average phoneme recognition error, which was 5.7% for the baseline system.
展开▼