In this paper, we propose a new framework to construct corpus-based topic-sensitive languae models of highly inflected languages for large vocabulary speech recognition. We concentrate on feature extraction process devoted to languages where words are formed by many differnet inflectional affixatations. I noru approach all words with the same meaning but differnet grammatical form are collected in one cluster automatically by using fuzzy comparison function. Using topic classifier sub-corpus of a large collection of training text is selected. Language models are built by interpolation of topic specific models and general model. results of experiments on English and Solvenian corpus are reported.
展开▼