In our Mandarin spoken document retrieval system, the effects of both retrieval source and retrieval model are considered. For the retrieval source, the syllable-lattice is adopted which can ameliorate the effect of speech recognition error on document retrieval. For the retrieval model, the document length prior is combined with Jelinek-Mercer smoothing technique, which is widely applied in text document retrieval model. As far as we know, the combination of syllable lattice and retrieval model based on the document length prior is firstly introduced for spoken document retrieval. Experimental results show that the retrieval performance of lattice-based method outperforms that of 1-best method. Further more, in the retrieval model with the document length priors, lattice-based approach can achieve the best performance, which can improve about 30%.
展开▼