While early machines adopted isolated syllables as input units and needed boring enrollment, our research focus on the speaker independent, word based dictation. A deliberately designed 120 speaker database was built for training; inter syllable context, tonal and endpoint dependent acoustic model are applied with a promising MFCC feature. Two pass acoustic matching accelerates the recognition, taking full advantage of the monosyllabic structure of Chinese speech. A complete word bigram and trigram serve as language processing module. With all efforts, the system reaches 90% character accuracy, performing in almost real time on a Pentium PC without DSP help.
展开▼