This paper proposes a new on-the-fly composition algorithm for Weighted Finite-State Transducers (WFSTs) in large-vocabulary continuous-speech recognition. In general on-the-fly composition, two transducers are composed during decoding, and a Viterbi search is performed based on the composed search space. In this new method, a Viterbi search is performed based on the first of two transducers. The second transducer is only used to rescore the hypotheses generated during the search. Since this rescoring is very efficient, the total amount of computation in the new method is almost the same as when using only the first transducer. In a 30k-word vocabulary spontaneous lecture speech transcription task, our proposed method significantly outperformed the general on-the-fly composition method. Furthermore the speed of our method was slightly faster than that of decoding with a single fully composed and optimized WFST, where our method consumed only 20% of the memory usage required for decoding with the single WFST. Finally, we have achieved one-pass real-time speech recognition in an extremely large vocabulary of 1.8 million words.
展开▼