Four kinds of commonly used text feature extraction method , namely, IG(Information gain), MI(Mu-tual information), CHI(χ2 statistical magnitude), DF(Document frequency) are introduced respectively.And then two improved methods are proposed according to the method of CHI , and analyzed by the experiment .It turns out to show that the improved method can increase the correct rate of text categorization .%介绍了IG(Information gain)信息增益,MI(Mutual information)互信息值,CHI(χ2统计法)、DF( Document frequency )文档频率4种常用的文本特征提取方法,然后针对CHI方法提出了改进方法,并对改进的方法进行了实验分析,结果表明改进的方法提高了文本分类的正确率。
展开▼