首页> 外文期刊>Knowledge-Based Systems >Binary PSO with mutation operator for feature selection using decision tree applied to spam detection
【24h】

Binary PSO with mutation operator for feature selection using decision tree applied to spam detection

机译:带有变异算子的二进制PSO,用于将决策树应用于垃圾邮件检测以进行特征选择

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we proposed a novel spam detection method that focused on reducing the false positive error of mislabeling nonspam as spam. First, we used the wrapper-based feature selection method to extract crucial features. Second, the decision tree was chosen as the classifier model with C4.5 as the training algorithm. Third, the cost matrix was introduced to give different weights to two error types, i.e., the false positive and the false negative errors. We define the weight parameter as a to adjust the relative importance of the two error types. Fourth, K-fold cross validation was employed to reduce out-of-sample error. Finally, the binary PSO with mutation operator (MBPSO) was used as the subset search strategy. Our experimental dataset contains 6000 emails, which were collected during the year of 2012. We conducted a Kolmogorov-Smirnov hypothesis test on the capital-run-length related features and found that all the p values were less than 0.001. Afterwards, we found a » 7 was the most appropriate in our model. Among seven meta-heuristic algorithms, we demonstrated the MBPSO is superior to GA, RSA, PSO, and BPSO in terms of classification performance. The sensitivity, specificity, and accuracy of the decision tree with feature selection by MBPSO were 91.02%, 97.51%, and 94.27%, respectively. We also compared the MBPSO with conventional feature selection methods such as SFS and SBS. The results showed that the MBPSO performs better than SFS and SBS. We also demonstrated that wrappers are more effective than filters with regard to classification performance indexes. It was clearly shown that the proposed method is effective, and it can reduce the false positive error without compromising the sensitivity and accuracy values.
机译:在本文中,我们提出了一种新颖的垃圾邮件检测方法,旨在减少将非垃圾邮件误标签为垃圾邮件的误报错误。首先,我们使用了基于包装器的特征选择方法来提取关键特征。其次,以C4.5作为训练算法,选择决策树作为分类器模型。第三,引入成本矩阵以对两种误差类型,即误报和误报错误赋予不同的权重。我们将权重参数定义为,以调整两种错误类型的相对重要性。第四,采用K折交叉验证来减少样本外误差。最后,将带有变异算子的二进制PSO(MBPSO)用作子集搜索策略。我们的实验数据集包含6000封电子邮件,这些邮件在2012年期间收集。我们对大写游程相关特征进行了Kolmogorov-Smirnov假设检验,发现所有p值均小于0.001。之后,我们发现»7在我们的模型中最合适。在七个元启发式算法中,我们证明了MBPSO在分类性能方面优于GA,RSA,PSO和BPSO。 MBPSO进行特征选择的决策树的敏感性,特异性和准确性分别为91.02%,97.51%和94.27%。我们还将MBPSO与传统的特征选择方法(例如SFS和SBS)进行了比较。结果表明,MBPSO的性能优于SFS和SBS。我们还证明,就分类性能指标而言,包装器比过滤器更有效。清楚地表明,所提出的方法是有效的,并且可以在不损害灵敏度和准确性值的情况下减少假阳性误差。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号