首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme
【24h】

Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme

机译:Cosdes:具有新型电子邮件抽象方案的协作式垃圾邮件检测系统

获取原文
获取原文并翻译 | 示例

摘要

E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In recent years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user feedback, to block subsequent near-duplicate spams. On purpose of achieving efficient similarity matching and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams. Moreover, we design a complete spam detection system Cosdes (standing for COllaborative Spam DEtection System), which possesses an efficient near-duplicate matching scheme and a progressive update scheme. The progressive update scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection. We evaluate Cosdes on a live data set collected from a real e-mail server and show that our system outperforms the prior approaches in detection results and is applicable to the real world.
机译:如今,电子邮件通信已是必不可少的,但是电子邮件垃圾邮件问题继续急剧增长。近年来,具有近乎重复的相似性匹配方案的协作式垃圾邮件过滤概念已得到广泛讨论。用于垃圾邮件检测的相似性匹配方案的主要思想是维护由用户反馈形成的已知垃圾邮件数据库,以阻止后续的近重复垃圾邮件。为了实现有效的相似性匹配并减少存储利用率,现有技术主要通过从电子邮件内容文本派生的简洁抽象来表示每个电子邮件。但是,这些电子邮件的抽象不能完全抓住垃圾邮件的不断发展的本质,因此在几乎重复的检测中不够有效。在本文中,我们提出了一种新颖的电子邮件抽象方案,该方案考虑了电子邮件布局结构来表示电子邮件。我们提出了使用电子邮件中的HTML内容生成电子邮件抽象的过程,这种新设计的抽象可以更有效地捕获垃圾邮件的几乎重复的现象。此外,我们设计了一个完整的垃圾邮件检测系统Cosdes(代表垃圾邮件检测系统),它具有高效的近重复匹配方案和渐进更新方案。渐进式更新方案使系统Cosdes可以保留最新信息,以进行近乎重复的检测。我们根据从真实电子邮件服务器收集的实时数据集评估Cosdes,并表明我们的系统在检测结果方面优于先前的方法,并且适用于现实世界。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号