Articulation training with many kinds of stimulus and messages such as visual, voice, and articulatory information can teach user to pronounce correctly and improve userȁ9;s articulatory ability. In this paper, an articulation training system with intelligent interface and multimode feedbacks is proposed to improve the performance of articulation training. Clinical knowledge of speech evaluation is used to design the dependent network. Then, automatic speech recognition with dependent network is applied to identify the pronunciation errors. Besides, hierarchical Bayesian network is proposed to recognize userȁ9;s emotion from speeches. With the information of pronunciation errors and userȁ9;s emotional state, the articulation training sentences can be dynamically selected. Finally, a 3D facial animation is provided to teach users to pronounce a sentence by using speech, lip motion, and tongue motion. Experimental results reveal the usefulness of proposed method and system.
展开▼