首页> 外文会议>Artificial Intelligence Applications and Innovations >DESIGN AND IMPLEMENT COST-SENSITIVE EMAIL FILTERING ALGORITHMS
【24h】

DESIGN AND IMPLEMENT COST-SENSITIVE EMAIL FILTERING ALGORITHMS

机译:设计和实现成本敏感的电子邮件过滤算法

获取原文

摘要

The growing problem of unsolicited bulk e-mail, also known as "spam", has generated a need for reliable anti-spam e-mail filters. We introduce seven filtering algorithms: Naive Bayesian (NB), Decision Tree (DT), AdaBoost, ANN, SVM, VSM and KNN. Design considerations and implementation issues of these filters are discussed, such as how to get cost-sensitive NB, SVM, VSM, KNN. Using two relatively large amounts of real personal E-mail data, a comprehensive comparative study based on a cost-sensitive measure we approved was conducted using above seven filters. The study includes the effect of feature subset size, training-corpus distribution, issues that have not been explored in previous experiments. The comparative results show that cost-sensitive filters such as NB, SVM, VSM and KNN have fewer count of misclassifying legitimate when relative parameters, feature subset size and training dataset's distribution are reasonable.
机译:不请自来的批量电子邮件(也称为“垃圾邮件”)的日益严重的问题引起了对可靠的反垃圾邮件过滤器的需求。我们介绍了七种过滤算法:朴素贝叶斯(NB),决策树(DT),AdaBoost,ANN,SVM,VSM和KNN。讨论了这些过滤器的设计注意事项和实现问题,例如如何获得成本敏感的NB,SVM,VSM,KNN。使用两个相对大量的真实个人电子邮件数据,使用以上七个过滤器进行了一项基于我们批准的对成本敏感的措施的全面比较研究。该研究包括特征子集大小,训练语料库分布的影响,以及先前实验中未曾探讨过的问题。比较结果表明,当相对参数,特征子集大小和训练数据集的分布合理时,诸如NB,SVM,VSM和KNN等成本敏感型过滤器的合法分类错误次数会减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号