The traditional language model takes the multi-topics document corpus as the research target. In order to avoid the interference brought by the multi-topics problem, this paper focuses on the domain specific Information Retrieval (IR). In domain specific IR, different terms are considered to take different contribution degrees to the final query result. So the terms in a document can be divided into different categories according to their contribution degrees. And the statistical information of a term, mainly its probabilities, is computed by different methods and smooth strategies according to its category. This paper proposed an improved hybrid statistical language model used in the Domain Specific IR. This new model has about 9%~10% performance increment in the experimental result. In the end, some challenges and research orientation of the statistical language model research are presented.
展开▼