The traditional TF-IDF algorithm is a common method that is used to measure feature weight in text categorization. However, the algorithm doesn''t take the distribution of feature terms in inter-class and intra-class into consideration. Consequently, the algorithm can''t effectively weigh the distribution proportion of feature items.In order to solve this problem, information entropy in inter-class and intra-class which describes the distribution of feature terms was used to revise TF-IDF weight.Compared with traditional TF-IDF algorithm,the results of simulation experiment have demonstrated that the improved TF-IDF algorithm can get better classification results.
展开▼