The rapid development of speech processing technology provides a potential for speech retrieval. This paper designs and implements a content-based Chinese speech document retrieval system using keyword spotting and text classification. In this system, a segment of unknown spontaneous speech will be converted into a series of keywords and then classified into a certain category, called topic, hoping to establish a retrieval model with two-level semantic information, which enables users to search for desired speech by keyword or topic query. Besides, based on the theory of mutual information, text classification is also used to react on the keywords to remove some false alarms. This paper mainly describes the structure, principle and completion situation of this retrieval system, finally gives the experimental results and discussions.
展开▼