特征选择是文本分类过程中极其重要的一个环节。本文提出了一种新的特征选择算法,该算法基于一个特征频度相对于其它特征频度的差值的总和衡量其相对贡献率的大小,从而进行特征选择。本文使用了基准数据集20-Newgroups,在朴素贝叶斯和支持向量机两个分类器上对该方法进行了验证。实验结果表明,与信息增益、互信息,几率比和DIA相关因子等四种流行的特征选择算法相比,该算法有效降低了文本的特征维数,提高了分类精度。%Feature selection in text categorization process is extremely important part. We consider a character-istic frequency of the frequency with respect to other features of the sum of the difference,proposed a new fea-ture selection method,the relative contribution of feature selection method(RC method). Experiments using benchmark data sets 20-Newgroups, using Naive Bayes and support vector machine two classification algo-rithms,experimental results show that,in contrast to information gain,mutual information,odds ratio and DIA associated factor four well-known feature selection algorithm,the methods for effectively reducing the character-istic dimension of the text,to improve the classification accuracy.
展开▼