首页> 外文期刊>International Journal of Computer Network and Information Security >Comparative Analysis of Classification Algorithms for Email Spam Detection
【24h】

Comparative Analysis of Classification Algorithms for Email Spam Detection

机译:垃圾邮件检测分类算法的比较分析

获取原文
获取外文期刊封面目录资料

摘要

The increase in the use of email in every day transactions for a lot of businesses or general communication due to its cost effectiveness and efficiency has made emails vulnerable to attacks including spamming. Spam emails also called junk emails are unsolicited messages that are almost identical and sent to multiple recipients randomly. In this study, a performance analysis is done on some classification algorithms including: Bayesian Logistic Regression, Hidden Na?ve Bayes, Radial Basis Function (RBF) Network, Voted Perceptron, Lazy Bayesian Rule, Logit Boost, Rotation Forest, NNge, Logistic Model Tree, REP Tree, Na?ve Bayes, Multilayer Perceptron, Random Tree and J48. The performance of the algorithms were measured in terms of Accuracy, Precision, Recall, F-Measure, Root Mean Squared Error, Receiver Operator Characteristics Area and Root Relative Squared Error using WEKA data mining tool. To have a balanced view on the classification algorithms’ performance, no feature selection or performance boosting method was employed. The research showed that a number of classification algorithms exist that if properly explored through feature selection means will yield more accurate results for email classification. Rotation Forest is found to be the classifier that gives the best accuracy of 94.2%. Though none of the algorithms did not achieve 100% accuracy in sorting spam emails, Rotation Forest has shown a near degree to achieving most accurate result.
机译:由于电子邮件的成本效益和效率,在许多企业或一般通信的日常事务中电子邮件的使用增加,使得电子邮件容易受到包括垃圾邮件在内的攻击的攻击。垃圾邮件也称为垃圾邮件,是未经请求的邮件,几乎是相同的,并且会随机发送给多个收件人。在这项研究中,对一些分类算法进行了性能分析,包括:贝叶斯Logistic回归,隐藏朴素贝叶斯,径向基函数(RBF)网络,投票感知器,惰性贝叶斯规则,Logit Boost,旋转森林,NNge,Logistic模型树,REP树,朴素贝叶斯,多层感知器,随机树和J48。使用WEKA数据挖掘工具,根据准确性,精度,召回率,F量度,均方根误差,接收机操作员特征区域和均方根误差来测量算法的性能。为了对分类算法的性能有一个平衡的看法,没有使用特征选择或性能提升方法。研究表明,存在许多分类算法,如果通过特征选择手段进行适当探索,它们将为电子邮件分类提供更准确的结果。旋转森林被认为是能够提供94.2%最佳准确性的分类器。尽管没有一种算法不能对垃圾邮件进行100%的准确分类,但Rotation Forest却显示出可以接近最准确结果的程度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号