This paper aims at high accuracy recognition of a lecture speech as a representative of a spontaneous speech. To obtain high accuracy recognition of a lecture speech, speaker adaptation of an acoustic model and a linguistic model is considered. In speaker adaptation of an acoustic model, a combination of a supervised adaptation and an unsupervised adaptation is performed. In speaker adaptation of a linguistic model, linear interpolation of word-based beseline model and class-based model maked form recognition result is performed. Recognition experiment by an evaluation set of CSJ shows that acoustic model adaptation and linguistic model adaptation achieve WER improvement of seven points and two points, respectively, and by acoustic model adaptation in combination with linguistic model adaptation WER improvement of nine points is achieved. As a result, WER of 20.9% was obtained.
展开▼