We study on a syllable-based acoustical modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition system. In this paper, syllable-based unit and mora-based unit are clealy distinguished in their definition, and syllables are shown to more suitable as an acoustic model in Japanese spontaneous speech recognition. In spontaneous speech, a vowel lengthening occurs frequently, and recognition accuracy is greatly affected by this phenomena. In this view point, we propose an acoustical modeling technique that emplicitly incorporates the vowel lengthening in syllable-based HMMs. Experimental results showed that the proposed model could exceed the performance of conventionally used cross-word triphone model and mora-based model in Japanese spontaneous speech recognition task.
展开▼