In this paper, we propose a new method that detects mis-recognized utterances, based on voting scheme like ROVER. ROVER has two serious problems, 1) it is difficult to construct multiple speech recognition systems (SRSs), 2) calculation cost increases according to the number of SRSs. In contrast to the conventional ROVER, the proposed method uses multiple language models (LMs), general LM and sub LMs generated by clustered sentence, instead of different SRSs. Speech recognition with sub LMs is proceeded by rescoring, instead of parallel decodlug. Through experiments, the proposed method resulted in 18-point higher precision with 10% loss of recall from baseline, and 22-point higher precision with 20% loss of recall.
展开▼