Feature extraction is a crucial part in text mining. After word splitting, the docs of the train set form the original feature space, but the dimension of the space is usually very large, it reaches hundreds of thousands of demensions. After feature extraction, not only the dimension of the space decreases sharply, but also, the impact of the noise is reduced. Finally, speed and precision of the classifier are both increased. This paper improves the original mutual information method, and proves it' s vilid in the experiment.%特征选择在文本挖掘技术中是一个关键部分.训练集中的文本逐个经过分词后,可形成文本分类系统的全特征空间,一般情况下,这个空间的维数都会较大,可达到几十万维.经过特征选择之后,在降低噪声的同时,特征空间的维数得以压缩,最终能提高分类算法的速度和分类精度.本文从传统的Ml(Mutual Infomation)出发,并对它进行改造,最后通过实验验证改进算法的有效性.
展开▼