首页> 外文会议>International conference on neural information processing;ICONIP 2010 >A Heuristic-Based Feature Selection Method for Clustering Spam Emails
【24h】

A Heuristic-Based Feature Selection Method for Clustering Spam Emails

机译:基于启发式的垃圾邮件集群特征选择方法

获取原文

摘要

In recent years, in order to cope with spam based attacks, there have been many efforts made towards the clustering of spam emails. During the clustering process, many statistical features (e.g., the size of emails) are used for calculating similarities between spam emails. In many cases, however, some of the features may be redundant or contribute little to the clustering process. Feature selection is one of the most typical methods used to identify a subset of key features from an initial set. In this paper, we propose a heuristic-based feature selection method for clustering spam emails. Unlike the existing methods in that they make the combinations of given features and evaluate them using data mining and machine learning techniques, our method focuses on evaluating each feature according to only its value distribution in spam clusters. With our method, we identified 4 significant features which yielded a clustering accuracy of 86.33% with low time complexity.
机译:近年来,为了应对基于垃圾邮件的攻击,已经做出了许多努力来聚类垃圾邮件。在群集过程中,许多统计功能(例如,电子邮件的大小)用于计算垃圾邮件之间的相似度。但是,在许多情况下,某些功能可能是多余的,或者对群集过程的贡献很小。特征选择是用于从初始集合中识别关键特征子集的最典型方法之一。在本文中,我们提出了一种基于启发式的特征选择方法,用于对垃圾邮件进行聚类。与现有方法不同,现有方法将给定功能组合在一起,并使用数据挖掘和机器学习技术对其进行评估,而我们的方法则侧重于仅根据垃圾邮件群集中其功能的价值评估来评估每个功能。使用我们的方法,我们确定了4个重要特征,这些特征产生的聚类精度为86.33%,且时间复杂度较低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号