首页> 外文会议>IEEE International Conference on Computer-Aided Industrial Design Conceptual Design >Filtering image-based spam using multifractal analysis and active learning feedback-driven semi-supervised support vector machine
【24h】

Filtering image-based spam using multifractal analysis and active learning feedback-driven semi-supervised support vector machine

机译:使用多分析分析和主动学习反馈驱动的半监控支持向量机过滤基于图像的垃圾邮件

获取原文

摘要

Traditional anti-spam technologies can't block image-based spam because spammers employ a variety of image creation and randomization algorithms to make the message fully legible by the human eye but undistinguishable by the most anti-spam engines. In this paper we propose a novel composite method to filter image-based spam accurately and effectively, which can be easily implemented as a plug-in in SpamAssassin. Our method takes advantage of the two natures of image-based spams: large quantity, similarity and character variability. For the first nature, we use rules of SpamAssassin to detect the emails characteristic. If a new email has been identified as spam by the rules, it will be blocked. Otherwise, image-based mail will be captured by the plug-in. For the second nature,the plug-in will use multifractal analysis in multi-orientation wavelet pyramid algorithm to get image-based email texture descriptor which has strong invariance to many factors, use a hybrid filter-wrapper feature subset selection algorithm based on particle swarm optimization to reduce some redundant or irrelevant features in the texture descriptor, and use a semi-supervised support vector machines classification algorithm to detect whether an email is ham or spam, then use active learning clustering to get the most representative emails for relabeling through user feedback. The relabeled emails by users feedback and the unlabeled suspect spams by SVM will be used to retrain the classification for improving accuracy of spam filter. The experimental results demonstrate that our method is of high efficiency, high accuracy and low false positive rate. The accuracy will be improved and the false positive rate will be reduced along with more and more retraining. So, the method is fit especially for an adversarial learning and processing like spam filtering.
机译:传统的反垃圾邮件技术无法阻止基于图像的垃圾邮件,因为垃圾邮件发送器采用各种图像创建和随机化算法,使得人眼完全清晰清晰地清晰,但由最具反垃圾邮件发动机无法区分。在本文中,我们提出了一种新颖的复合方法,可以精确且有效地过滤基于图像的垃圾邮件,这可以很容易地实现为蜘蛛类中的插件。我们的方法利用了基于图像的两种基于图像的垃圾邮件:大量,相似性和字符变异性。对于第一个性质,我们使用SpamAssass的规则来检测电子邮件特征。如果已将新电子邮件识别为规则的垃圾邮件,则会被阻止。否则,将通过插件捕获基于图像的邮件。对于第二种性质,插件将在多向小波金字塔算法中使用多重分析来获取基于图像的电子邮件纹理描述符,这具有强大的许多因素的不变性,请使用基于粒子群的混合滤波器包装特征子集选择算法优化,以减少纹理描述符中的一些冗余或无关的功能,并使用半监控的支持向量机分类算法来检测电子邮件是否是火腿或垃圾邮件,然后使用主动学习聚类来获取最多代表性的电子邮件,以通过用户反馈重新标记。用户反馈和未标记的SVM的重新标记的电子邮件将用于恢复提高垃圾邮件过滤器精度的分类。实验结果表明,我们的方法具有高效率,高精度和低误率。准确性将得到改善,并且越来越多的效果将减少假阳性率。因此,该方法尤其适用于对垃圾邮件过滤等对抗的学习和处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号