首页> 外文会议>2014 5th International Conference- Confluence The Next Generation Information Technology Summit >A hybrid approach for spam filtering using local concentration based K-Means clustering
【24h】

A hybrid approach for spam filtering using local concentration based K-Means clustering

机译:使用基于局部集中的K-Means聚类的垃圾邮件过滤的混合方法

获取原文
获取原文并翻译 | 示例

摘要

Electronic mail (email) has become an essential element for Internet users. Many studies indicate that day by day numbers of internet users are increasing. As population increasing on the Internet, volume of email traffic is also growing. This entire volume of email consist 80% of unwanted emails. These unwanted emails are known as spam email and referred as unsolicited bulk email (UBE). These emails are sent in bulk to large number of recipients. This increased volume of spam email results a most common problem i.e. maintaining email inbox. Spam Email is major issue for internet community because it causes wastage of resources and also pollutes our environment. To prevent these adverse effects of spam email, spam filtering is essential task. Various researchers have proposed many techniques and algorithms for spam filtering; which focuses on individual parameters of the malicious content. In current scenario spammers are also become intelligent they attack on weak point of filtering system. In this work we divided entire process of filtering in four stages. At first stage we applied string tokenizer for generating terms from incoming message. These tokens are passed to second stage where we applied Information Gain (IG) as term selection strategy. After this we passed selected terms to third stage of filtering. Third stage consist of Local Concentration based Artificial Immune System for feature selection. Newly constructed feature vectors are passed to K-Means clustering algorithm for classification at fourth stage. In support of our work we conducted several experiments and gave a comparative analysis with various existing methods on different parameters.
机译:电子邮件(电子邮件)已成为Internet用户的基本要素。许多研究表明,互联网用户的数量每天都在增加。随着Internet上人口的增长,电子邮件通信量也在增长。电子邮件的全部数量占不需要电子邮件的80%。这些不需要的电子邮件称为垃圾邮件,也称为不请自来的批量电子邮件(UBE)。这些电子邮件将批量发送给大量收件人。垃圾邮件数量的增加导致最常见的问题,即维护电子邮件收件箱。垃圾电子邮件是Internet社区的主要问题,因为它导致资源浪费并污染我们的环境。为了防止垃圾邮件的这些不利影响,垃圾邮件过滤是必不可少的任务。许多研究人员提出了许多垃圾邮件过滤技术和算法。它着重于恶意内容的各个参数。在当前情况下,垃圾邮件发送者也变得很聪明,他们攻击过滤系统的薄弱环节。在这项工作中,我们将整个过滤过程分为四个阶段。在第一阶段,我们应用了字符串标记器,用于根据传入消息生成术语。这些令牌被传递到第二阶段,在该阶段我们应用信息增益(IG)作为术语选择策略。此后,我们将选定的术语传递到过滤的第三阶段。第三阶段包括用于特征选择的基于局部集中的人工免疫系统。新构建的特征向量在第四阶段传递给K-Means聚类算法进行分类。为了支持我们的工作,我们进行了几次实验,并使用各种现有方法对不同参数进行了比较分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号