...
首页> 外文期刊>Intelligent data analysis >An evaluation of Naive Bayes variants in content-based learning for spam filtering
【24h】

An evaluation of Naive Bayes variants in content-based learning for spam filtering

机译:对基于内容的学习中的朴素贝叶斯变体进行垃圾邮件过滤的评估

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

We describe an in-depth analysis of spam-filtering performance of a simple Naive Bayes learner and two extended variants. A set of seven mailboxes comprising about 65,000 mails from seven different users, as well as a representative snapshot of 25,000 mails which were received over 18 weeks by a single user, were used for evaluation. Our main motivation was to test whether two extended variants of Naive Bayes learning, SA-Train and CRM114, were superior to simple Naive Bayes learning, represented by SpamBayes. Surprisingly, we found that the performance of these systems was remarkably similar and that the extended systems have significant weaknesses which are not apparent for the simpler Naive Bayes learner. The simpler Naive Bayes learner, SpamBayes, also offers the most stable performance in that it deteriorates least over time. Overall, SpamBayes should be preferred over the more complex variants.
机译:我们描述了一个简单的Naive Bayes学习者和两个扩展变体的垃圾邮件过滤性能的深入分析。评估使用了一组七个邮箱,其中包括来自七个不同用户的约65,000封邮件,以及一个用户在18周内收到的25,000封邮件的代表快照。我们的主要动机是测试朴素贝叶斯学习的两个扩展变体SA-Train和CRM114是否优于以SpamBayes为代表的简单朴素贝叶斯学习。令人惊讶地,我们发现这些系统的性能非常相似,并且扩展的系统具有明显的弱点,这对于较简单的Naive Bayes学习者而言是不明显的。较简单的朴素贝叶斯学习器SpamBayes还提供最稳定的性能,因为随着时间的推移其性能下降最少。总的来说,SpamBayes应该比更复杂的变体更可取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号