Large vocabulary continuous Mandarin speech recognition has been an important problem for speech recognition researchers for several reasons [1], [3]. First, of all, it is a tonal language that requires special treatment for the modeling of tones. There are five tones in mandarin which are necessary to disambiguate between confusable words. Secondly, the difficulty of entering Chinese by keyboard presents a great opportunity for speech recognition to improve computer usability. Previous approaches to modeling tones have included using a separate tone classifier [1] and incorporating pitch directly into the feature vector [3]. In this paper, we describe a large vocabulary Mandarin speech recognition system based on Microsoft's Whisper system. Several alternatives in modeling tones and their error rates on continuous speech are compared.
展开▼