首页> 外文期刊>Software >Better Naive Bayes classification for high-precision spam detection
【24h】

Better Naive Bayes classification for high-precision spam detection

机译:更好的朴素贝叶斯分类,可进行高精度垃圾邮件检测

获取原文
获取原文并翻译 | 示例
       

摘要

Email spam has become a major problem for Internet users and providers. One major obstacle to its eradication is that the potential solutions need to ensure a very low false-positive rate, which tends to be difficult in practice. We address the problem of low-FPR classification in the context of naive Bayes, which represents one of the most popular machine learning models applied in the spam filtering domain. Drawing from the recent extensions, we propose a new term weight aggregation function, which leads to markedly better results than the standard alternatives. We identify short instances as ones with disproportionally poor performance and counter this behavior with a collaborative filtering-based feature augmentation. Finally, we propose a tree-based classifier cascade for which decision thresholds of the leaf nodes are jointly optimized for the best overall performance. These improvements, both individually and in aggregate, lead to substantially better detection rate of precision when compared with some of the best variants of naive Bayes proposed to date.
机译:电子邮件垃圾邮件已成为Internet用户和提供商的主要问题。根除它的一个主要障碍是潜在的解决方案需要确保非常低的假阳性率,这在实践中往往很困难。我们在朴素的贝叶斯背景下解决了低FPR分类的问题,贝叶斯代表了垃圾邮件过滤领域中最流行的机器学习模型之一。根据最近的扩展,我们提出了一个新的术语权重汇总函数,与标准的替代方案相比,它可以显着改善结果。我们将短实例识别为性能不成比例的短实例,并通过基于协作过滤的功能增强来应对这种行为。最后,我们提出了一个基于树的分类器级联,针对该分类器叶节点的决策阈值被联合优化以获得最佳总体性能。与迄今提出的某些朴素贝叶斯的最佳变体相比,这些改进无论是单独还是总体而言,都导致检测精度大大提高。

著录项

  • 来源
    《Software》 |2009年第11期|1003-1024|共22页
  • 作者单位

    Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, U.S.A.;

    Micmsoft Live Labs, One Microsoft Way, Redmond, WA 98052, U.S.A.;

    College of Information Science and Technology, The Pennsylvania State University, University Park, PA 16802, U.S.A.;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    spam filtering; naive Bayes; cascaded models;

    机译:垃圾邮件过滤;朴素的贝叶斯级联模型;
  • 入库时间 2022-08-17 13:03:54

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号