首页> 外文学位 >Towards improving e -mail content classification for spam control: Architecture, abstraction, and strategies.
【24h】

Towards improving e -mail content classification for spam control: Architecture, abstraction, and strategies.

机译:旨在改善垃圾邮件控制的电子邮件内容分类:体系结构,抽象和策略。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam detection also allows prioritizing e-mail servicing at receiving e-mail servers to safeguard non-spam e-mail deliveries even under heavy spam traffic. Fast spam detection also allows spam rejection during Simple Mail Transfer Protocol sessions for inbound and outbound spam control. We have four contributions in the dissertation.;In our second contribution, we propose e-mail content pre-classification at network layer (layer 3) instead of at application layer (layer 7) as currently being practiced to allow e-mail packet pre-classification and distributed processing for effective spam detection beyond server implementations. By performing e-mail content classification at a lower abstraction level, e-mail packets can be pre-processed, without reassembly, at any network node between sender and receiver. We demonstrated that the naive Bayes e-mail content classification can be adapted for layer-3 processing. We also show that fast e-mail class estimation can be performed at receiving e-mail servers. Through simulation using e-mail data sets, we showed that the layer-3 e-mail content classification is capable of detecting spam with accuracy and false positive values that approximately equal the ones at layer 7.;In our third contribution, we propose a prioritized e-mail servicing scheme using a priority queuing approach to improve spam handling at receiving e-mail servers. In this scheme, priority is given higher to non-spam e-mail than spam. Four servicing strategies for the proposed scheme are studied. We analyzed the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned.;In our fourth contribution, we are propose a spam handling scheme that rejects spam during Simple Mail Transfer Protocol sessions. The proposed spam handling scheme allows inbound and outbound spam control. It is capable of reducing servers' loading and hence, non-spam queuing delay and loss probability. We analyze the performance of this scheme under different e-mail traffic loads and service capacities. We show that the non-spam delay and loss probability can be reduced when the server is under-provisioned.;ln our first contribution, we propose a hardware architecture for naive Bayes content classification unit for a high-throughput spam detection computation. We use the logarithmic number system to simplify the naive Bayes computation. To handle the fast but lossy logarithmic number system computation, we analyze the noise model of our hardware architecture. Through noise analysis, synthesis, and verification by numerical simulation, we show that the naive Bayes classification unit, implemented on FPGA is capable of processing, with very low computation noise, more than one hundred million features per second, an order of magnitude faster than that on a general-purpose processor implementation.;In this dissertation, we present four techniques to improve spam control on e-mail content classification. We envision that our proposed approaches complement rather than replace the current spam control systems. The proposed four approaches are capable to work with existing spam control systems and support proactive spam and other e-mail-based threats such as phishing and e-mail worm controls anywhere across the Internet.
机译:本文讨论了提高垃圾邮件控制的有效性和效率的技术。具体来说,提出了第3层电子邮件内容分类,以允许对电子邮件进行预分类(用于在接收电子邮件服务器处进行快速垃圾邮件检测),并允许在网络节点处进行分布式处理以在垃圾邮件控制点进行快速垃圾邮件检测(例如) ,在电子邮件服务器上。快速的垃圾邮件检测还允许在接收电子邮件服务器时优先处理电子邮件服务,即使在垃圾邮件繁忙的情况下,也可以保护非垃圾邮件的传递。快速垃圾邮件检测还允许在“简单邮件传输协议”会话期间拒绝垃圾邮件,以控制入站和出站垃圾邮件。我们在论文中有四个贡献。在第二个贡献中,我们建议在网络层(第3层)而不是在当前应用层(第7层)对电子邮件内容进行预分类,以允许对电子邮件数据包进行预分类。 -分类和分布式处理,可有效地检测服务器以外的垃圾邮件。通过以较低的抽象级别执行电子邮件内容分类,可以在发送方和接收方之间的任何网络节点上对电子邮件数据包进行预处理,而无需进行重组。我们证明了朴素的贝叶斯电子邮件内容分类可以适用于第3层处理。我们还显示,可以在接收电子邮件服务器上执行快速电子邮件类别估计。通过使用电子邮件数据集进行模拟,我们表明,第3层电子邮件内容分类能够检测具有准确度和假阳性值(近似等于第7层)的垃圾邮件;在第三部分中,我们提出了使用优先级排队方法的优先电子邮件服务方案,以改进接收电子邮件服务器时的垃圾邮件处理。在此方案中,非垃圾邮件优先于垃圾邮件。研究了该方案的四种服务策略。我们分析了该方案在不同的电子邮件流量负载和服务容量下的性能。我们显示出,当服务器配置不足时,可以减少非垃圾邮件的延迟和丢失的可能性。在第四部分中,我们提出了一种垃圾邮件处理方案,该方案在“简单邮件传输协议”会话期间拒绝垃圾邮件。提出的垃圾邮件处理方案可以控制入站和出站垃圾邮件。它能够减少服务器的负载,从而减少非垃圾邮件排队的延迟和丢失的可能性。我们分析了该方案在不同的电子邮件流量负载和服务容量下的性能。我们表明,当服务器配置不足时,可以减少非垃圾邮件的延迟和丢失的可能性。在我们的第一篇论文中,我们提出了朴素贝叶斯内容分类单元的硬件架构,用于高吞吐量的垃圾邮件检测计算。我们使用对数数字系统来简化朴素贝叶斯计算。为了处理快速但有损的对数系统计算,我们分析了硬件体系结构的噪声模型。通过噪声分析,综合和数值模拟验证,我们表明,在FPGA上实现的朴素贝叶斯分类单元能够以非常低的计算噪声处理,每秒处理超过一亿个特征,速度比以前快一个数量级。本文主要介绍四种改进电子邮件内容分类中垃圾邮件控制的技术。我们设想,我们提出的方法可以补充而不是取代当前的垃圾邮件控制系统。提出的四种方法能够与现有的垃圾邮件控制系统配合使用,并支持主动垃圾邮件和其他基于电子邮件的威胁,例如Internet上任何地方的网络钓鱼和电子邮件蠕虫控件。

著录项

  • 作者

    Marsono, Muhammad Nadzir.;

  • 作者单位

    University of Victoria (Canada).;

  • 授予单位 University of Victoria (Canada).;
  • 学科 Engineering Electronics and Electrical.;Computer Science.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 162 p.
  • 总页数 162
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号