首页> 外文期刊>Information retrieval >A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
【24h】

A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists

机译:基于内存的邮件列表反垃圾邮件筛选方法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents an extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization that attempts to identify automatically unsolicited commercial messages that flood mailboxes. Focusing on anti-spam filtering for mailing lists, a thorough investigation of the effectiveness of a memory-based anti-spam filter is performed using a publicly available corpus. The investigation includes different attribute and distance-weighting schemes, and studies on the effect of the neighborhood size, the size of the attribute set, and the size of the training corpus. Three different cost scenarios are identified, and suitable cost-sensitive evaluation functions are employed. We conclude that memory-based anti-spam filtering for mailing lists is practically feasible, especially when combined with additional safety nets. Compared to a previously tested Naive Bayes filter, the memory-based filter performs on average better, particularly when the misclassification cost for non-spam messages is high.
机译:本文在反垃圾邮件过滤的背景下对基于内存的学习进行了广泛的经验评估,这是一种新颖的成本敏感型文本分类应用程序,试图自动识别泛滥的自发商业邮件。专注于邮件列表的反垃圾邮件筛选,使用可公开获得的语料库对基于内存的反垃圾邮件筛选器的有效性进行彻底调查。该调查包括不同的属性和距离加权方案,并研究了邻域大小,属性集大小和训练语料库大小的影响。确定了三种不同的成本方案,并采用了适合成本敏感的评估功能。我们得出结论,针对邮件列表的基于内存的反垃圾邮件过滤实际上是可行的,尤其是在与其他安全网结合使用时。与以前测试过的Naive Bayes过滤器相比,基于内存的过滤器的平均性能更好,特别是在非垃圾邮件的误分类成本很高的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号