信息增益是文本分类中一种有效的特征项选择方法,针对垃圾邮件过滤中的特征项选择问题,提出了一种改进的信息增益方法提取特征词,并采用了最小风险贝叶斯的决策方法,最后在英文语料库上进行实验。实验结果表明改进后的方法降低了过滤器对合法邮件的误判。%The information gain is an effective feature selection method for the text classification, for spam filtering in the feature selection problem, put forward a kind of improved information gain method of extracting feature words, and the use of the minimum risk Bayes decision method, finally in the English Corpus for experiments, the experimental results show that the improved method to reduce the filter of legitimate emails misjudgment.
展开▼