In this research, we propose the index optimization for maximizing both the performance and the efficiency of information retrieval systems and the application of the proposed KNN to the task. The index optimization is mapped into a classification task within a domain, and the task should be distinguished from the topic based word categorization. In the proposed system, a text which is tagged with its own domain is given as the input, the words which are indexed from the text are classified into expansion, inclusion, or removal, by the feature similarity based KNN version. We validated empirically that the proposed KNN works better than the tradition version in optimizing indexes of news articles. In future, we will connect the task with the text categorization, in order to process texts which are untagged with their domains.
展开▼