首页> 外文会议>2012 international conference on computer and communication engineering >GA-based feature subset selection in a spamon-spam detection system
【24h】

GA-based feature subset selection in a spamon-spam detection system

机译:垃圾邮件/非垃圾邮件检测系统中基于GA的特征子集选择

获取原文
获取原文并翻译 | 示例

摘要

Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes.
机译:垃圾邮件已为世界各地的计算机用户带来了严重的安全问题。垃圾邮件发送者利用欺诈的优势来覆盖可用于识别垃圾邮件的部分邮件。例如,垃圾邮件发送者无需花费太多成本和带宽来发送垃圾邮件,甚至可以发送一百多个电子邮件。另一方面,从功能选择的角度来看,降低垃圾邮件和非垃圾邮件分类准确性的特定问题之一是数据维度高。因此,降维与减少无关特征的数量有关。本文在特征选择过程中应用了遗传算法(GA),以减少高维电子邮件正文和主题集合中无用特征的数量。接下来,使用多层感知器(MLP)对GA选定的特征进行分类。使用LingSpam基准语料库作为数据集,实验结果表明,与其他分类器(例如SVM和朴素贝叶斯)相比,带有MLP分类器的GA特征选择器不仅降低了数据维数,而且提高了垃圾邮件检测率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号