...
首页> 外文期刊>Computer networks >A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise
【24h】

A large-scale empirical analysis of email spam detection through network characteristics in a stand-alone enterprise

机译:独立企业中通过网络特征进行的电子邮件垃圾邮件检测的大规模实证分析

获取原文
获取原文并翻译 | 示例

摘要

Spam is a never-ending issue that constantly consumes resources to no useful end. In this paper, we envision spam filtering as a pipeline consisting of DNS blacklists, filters based on SYN packet features, filters based on traffic characteristics and filters based on message content. Each stage of the pipeline examines more information in the message but is more computationally expensive. A message is rejected as spam once any layer is sufficiently confident. We analyze this pipeline, focusing on the first three layers, from a single-enterprise perspective. To do this we use a large email dataset collected over two years. We devise a novel ground truth determination system to allow us to label this large dataset accurately. Using two machine learning algorithms, we study (ⅰ) how the different pipeline layers interact with each other and the value added by each layer, (ⅱ) the utility of individual features in each layer, (ⅲ) stability of the layers across time and network events and (ⅳ) an operational use case investigating whether this architecture can be practically useful. We find that (ⅰ) the pipeline architecture is generally useful in terms of accuracy as well as in an operational setting, (ⅱ) it generally ages gracefully across long time periods and (iii) in some cases, later layers can compensate for poor performance in the earlier layers. Among the caveats we find are that (ⅰ) the utility of network features is not as high in the single enterprise viewpoint as reported in other prior work, (ⅱ) major network events can sharply affect the detection rate, and (ⅲ) the operational (computational) benefit of the pipeline may depend on the efficiency of the final content filter.
机译:垃圾邮件是一个永无止境的问题,它不断消耗资源,无济于事。在本文中,我们将垃圾邮件过滤设想为由DNS黑名单,基于SYN数据包功能的过滤器,基于流量特征的过滤器和基于邮件内容的过滤器组成的管道。流水线的每个阶段都会检查消息中的更多信息,但计算量更大。一旦任何一层足够有信心,便将邮件拒绝为垃圾邮件。我们从单一企业的角度分析此管道,重点放在前三层。为此,我们使用了两年来收集的大型电子邮件数据集。我们设计了一种新颖的地面真相确定系统,使我们可以准确地标记这个大数据集。使用两种机器学习算法,我们研究(ⅰ)不同管线层如何相互影响以及每一层所增加的价值;(ⅱ)每层中各个要素的效用;(ⅲ)层在整个时间和网络事件以及(ⅳ)操作用例,以调查此体系结构是否实际有用。我们发现(ⅰ)管道体系结构通常在准确性和操作环境方面都很有用,(ⅱ)它通常会在很长一段时间内正常老化,并且(iii)在某些情况下,较新的层可以弥补较差的性能在早期的层中。我们发现的警告包括:(ⅰ)从单个企业的角度来看,网络功能的实用性不如其他先前工作所报道的那样;(ⅱ)重大网络事件会严重影响检测率,并且(ⅲ)流水线的(计算)优势可能取决于最终内容过滤器的效率。

著录项

  • 来源
    《Computer networks 》 |2014年第11期| 101-121| 共21页
  • 作者单位

    Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA;

    Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA;

    International Computer Science Institute, Berkeley, CA, USA;

    Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Spam; Network-level characteristics; Longitudinal analysis;

    机译:垃圾邮件;网络级特征;纵向分析;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号