An efficient, scalable speech recognition architecture is proposed for multi-domain dialog systems by combining topic detection and topic-dependent language modeling. The inferred domain is automatically detected from the user's utterance, and speech recognition is then performed with an appropriate domain-dependent language model. The architecture improves accuracy and efficiency over current approaches and is scaleable to a large number of domains. In this paper, unigram likelihood and SVM based topic detection methods are compared. A novel framework using a multi-layer hierarchy of language models is also introduced in order to improve robustness against topic detection errors. The proposed system provides a relative reduction in WER of 10.3% over a single language model system. Furthermore, it achieves an accuracy that is comparable to using multiple language models in parallel while requiring only a fraction of the computational cost.
展开▼