【24h】

An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

机译:朴素贝叶斯和基于关键字的反垃圾邮件过滤与个人电子邮件的实验比较

获取原文

摘要

The growing problem of unsolicited bulk e-mail, also known as "spam", has generated a need for reliable anti-spam e-mail filters. Filters of this type have so far been based mostly on manually constructed keyword patterns. An alternative approach has recently been proposed, whereby a Naive Bayesian classifier is trained automatically to detect spam messages. We test this approach on a large collection of personal e-mail messages, which we make publicly available in "encrypted" form contributing towards standard benchmarks. We introduce appropriate cost-sensitive measures, investigating at the same time the effect of attribute-set size, training-corpus size, lemmatization, and stop lists, issues that have not been explored in previous experiments. Finally, the Naive Bayesian filter is compared, in terms of performance, to a filter that uses keyword patterns, and which is part of a widely used e-mail reader.

机译:

不请自来的批量电子邮件(也称为“垃圾邮件”)的日益严重的问题引起了对可靠的反垃圾邮件过滤器的需求。到目前为止,这种类型的过滤器主要基于手动构建的关键字模式。最近提出了一种替代方法,通过该方法可以自动训练朴素贝叶斯分类器以检测垃圾邮件。我们在大量个人电子邮件消息上测试了此方法,我们以“加密”形式向公众公开这些消息,这些消息有助于实现标准基准测试。我们引入了适当的成本敏感措施,同时调查了属性集大小,训练语料库大小,词形化和停止列表的影响,而这些都是先前实验中未曾探讨过的问题。最后,就性能而言,将朴素贝叶斯过滤器与使用关键字模式的过滤器进行比较,该过滤器是广泛使用的电子邮件阅读器的一部分。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号