Text categorization for unrestricted text is one of the important issues in the field of information retrieval. The crux of the problem is to discover a model that relates words in a document to its general subject area. It seems to be very difficult to statistically acquire enough word-based knowledge to make a robust system capable of automatically categorizing unrestricted text. The major problems with word-based text categorization models include data sparseness and the lack of a level of abstraction. Word-based text categorization systems are hard to train sufficiently well, furthermore, they are difficult to port to new domains and run off the shelf. In this paper, we will show that a concept-based model for text categorization requires fewer parameters and has a built in element of generality. Broad lexical conceptual knowledge acquired from machine readable dictionaries can be used to produce a robust and portable text categorization system. A series of experiments was conducted to categorize on-line news obtained from the Internet in order to assess the performance of the proposed method. Experimental results show that the MRDs function effectively as a knowledge base for assigning subject areas to news articles and for text categorization in general.
展开▼