This paper compared several feature selection methods in text categorization, and proposed a new feature selection method(TFIDF_Ci) based on weighted frequency of distinction between the text. It improves TFIDF function from weighted frequency and the feature items can increase the ability of text categorization in documents. In the experiment, we tested the effect of this feature selection method and other feature selection methods by using KNN classifiers. The experiments show the new method has good performance and stability under different numbers of training sets.%在分析比较几种常用的特征选择方法的基础上,提出了一种引入文本类区分加权频率的特征选择方法TFIDF_Ci.它将具体类的文档出现频率引入TFIDF函数,提高了特征项所在文档所属类区分其他类的能力.实验中采用KNN分类算法对该方法和其他特征选择方法进行了比较测试.结果表明,TFIDF Ci方法较其他方法在不同的训练集规模情况下具有更高的分类精度和稳定性.
展开▼