Based on analysing the shortcomings of traditional mutual information algorithm, a new algorithm of quadratic TF * IDF-based mutual information text feature selection is put forward.This new algorithm measures the importance of feature words appeared only in one category once again and solves the problem that the feature selection can not be done effectively when the values of mutual information are equal.Some experiments are done to verify this algorithm on Bayesian classifier and the results illustrate that our algorithm leads to better efficiency and accuracy in text classification than the former algorithm.%在分析传统互信息法缺陷的基础上,提出一种基于二次TF*IDF的互信息特征选择算法,对仅在一个类别中出现的特征词的重要程度给予再次的衡量,解决了互信息值相等而无法进行有效特征选择的问题.利用贝叶斯分类器对该方法进行验证的结果表明该算法在文本分类效率和正确率上比原有方法有一定的提高.
展开▼