This work describes a HMM-based keyword spotting system. In this system, keywords are modeled as concatenations of the corresponding phoneme models, consequently, no specific databases are needed to train the system. In addition no filler models are required, therefore small computational requirements are necessary. Two main stages define the whole system. The first stage is based on a previous work of Junkawitsch et al. It calculates, for each keyword, a score signal that measures the match between the keyword model and the utterance and extracts from that signal those segments where the match is good. The segments corresponding to possible keywords are used as input hypotheses for the second stage in order to get a new confidence measure. This second score is determined based on a comparison between the vector of emission probabilities for an hypothesis over the keyword model and the vector of emission probabilities for the best sequence of phonemes, in the segment where the hypothesis was detected. The first score is linearly combined with the second one resulting in a new score which performs significa-tively better than that one.
展开▼