首页> 外文会议>International Confernece on Neural Information Processing >A Heuristic-Based Feature Selection Method for Clustering Spam Emails
【24h】

A Heuristic-Based Feature Selection Method for Clustering Spam Emails

机译:基于启发式的垃圾邮件的特征选择方法

获取原文

摘要

In recent years, in order to cope with spam based attacks, there have been many efforts made towards the clustering of spam emails. During the clustering process, many statistical features (e.g., the size of emails) are used for calculating similarities between spam emails. In many cases, however, some of the features may be redundant or contribute little to the clustering process. Feature selection is one of the most typical methods used to identify a subset of key features from an initial set. In this paper, we propose a heuristic-based feature selection method for clustering spam emails. Unlike the existing methods in that they make the combinations of given features and evaluate them using data mining and machine learning techniques, our method focuses on evaluating each feature according to only its value distribution in spam clusters. With our method, we identified 4 significant features which yielded a clustering accuracy of 86.33% with low time complexity.
机译:近年来,为了应对基于垃圾邮件的攻击,旨在为垃圾邮件的聚类而作出了许多努力。在聚类过程中,许多统计特征(例如,电子邮件大小)用于计算垃圾邮件之间的相似之处。然而,在许多情况下,一些特征可能是冗余的,或者对聚类过程有没有贡献。特征选择是最典型的方法之一,用于从初始集中标识关键特征的子集。在本文中,我们提出了一种基于启发式的特征选择方法,用于群集垃圾邮件。与现有方法不同,因为它们使给定功能的组合并使用数据挖掘和机器学习技术进行评估,我们的方法侧重于根据其在垃圾邮件集群中的值分布来评估每个功能。通过我们的方法,我们确定了4个显着的特征,其聚类精度为86.33%,具有低时间复杂度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号