首页> 外文会议>International Conference on Multimedia Information Networking and Security >Spam Feature Selection Based on the Improved Mutual Information Algorithm
【24h】

Spam Feature Selection Based on the Improved Mutual Information Algorithm

机译:基于改进的互信息算法的垃圾邮件特征选择

获取原文

摘要

Content-based spam filtering technologies generally use feature selection algorithm for mail classification. Based on the mutual information feature selection algorithm, this paper proposes an improved mutual information method with frequency (MIf) by introducing the word frequency factor, and an improved mutual information method with average frequency (MIaf) by introducing the word average frequency factor. Simulation experiments are conducted based on the English corpus (PU1's lemm_stop) and Chinese corpus CCERT email data set, the feature subsets are extracted through the improved algorithms, and the mails are classified by the Naïve Bayes algorithm. The experimental results show that the improved mutual information algorithms can select better feature subsets and enhance the mail classification effects.
机译:基于内容的垃圾邮件过滤技术通常使用特征选择算法进行邮件分类。 基于互信息特征选择算法,本文通过引入字频因子来提出具有频率(MIF)的改进的互信息方法,以及通过引入单词平均频率因子来提高具有平均频率(MIAF)的改进的互信息方法。 仿真实验是基于英文语料库(PU1的LEMM_STOP)和中文语料库CCERT电子邮件数据集进行的,通过改进的算法提取特征子集,邮件由NAï VE Bayes算法分类。 实验结果表明,改进的互信息算法可以选择更好的特征子集并增强邮件分类效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号