This paper presents a novel method for automatic pronunciation quality assessment. Unlike the traditional "Goodness of Pronunciation" (GOP) method, we judged utterance's pronunciation quality directly by a discriminative method. Under this novel framework, we also designed an algorithm to calculate the assessment confidence. We decoded the student's utterance for two passes. The first-pass decoding was just for getting the phone time points, and the second-pass decoding was for differentiating the pronunciation quality for each triphone. In the second-pass decoding, we used a specially trained acoustic model (AM), where the triphones in different pronunciation qualities were trained as different units. The confidence of the phone-level scoring was also calculated, and the low confidence phone-level scores were excluded in calculating the word-level score. The experimental results shows that the scoring performance was increased significantly compared to the traditional GOP method.
展开▼