首页> 外文会议>International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises >An Unsupervised Feature Selection Method for Data-Driven Anomaly Detection Systems
【24h】

An Unsupervised Feature Selection Method for Data-Driven Anomaly Detection Systems

机译:一种无监督的数据驱动异常检测系统的特征选择方法

获取原文

摘要

Feature selection has been widely used as a pre-processing step that helps to optimise the performance of data-driven intrusion/anomaly detection systems in achieving their tasks. For example, when grouping the data into normal and outlier groups, the existence of redundant and non-representative features would reduce the accuracy of classifying the data points and would also increase the processing time. Therefore, feature selection is applied as a pre-processing step for anomaly detection systems in order to optimize their classification accuracy and running time. Most of the existing feature selection methods have limitations when dealing with high-dimensional data, as they search different subsets of features to find accurate representations of all features. Obviously, searching for different combinations of features is computationally very expensive, which makes existing work not efficient for high-dimensional data. The work carried out here, which relates to the design of a similaritybased unsupervised feature selection method for an efficient and accurate anomaly detection (UFSAD), tackles mainly the selection of reduced set of representative features from high-dimensional data without the data class labels. The selected features should improve the accuracy and performance of anomaly detection systems due to the elimination of redundant and non-representative features. The proposed UFSAD method extends the k-mean clustering algorithm to partition the features into k clusters based on a similarity measure (e.g. PCC - Pearson Correlation Coefficient, LSRE - Least Square Regression Error or MICI - Maximal Information Compression Index) in order to accurately partition the features. Then the proposed centroid-based feature selection method is used, where the feature with the closest similarity to its cluster centroid is selected as the representative feature while others are discarded. Extensive experimental work has shown that UFSAD can generate a reduced representative and non-redundant feature set that achieves good classification accuracy in comparison with well-known unsupervised features selection methods.
机译:特征选择已被广泛用作预处理步骤,有助于优化数据驱动入侵/异常检测系统在实现其任务方面的性能。例如,在将数据分组到正常和异常组时,冗余和非代表特征的存在将降低分类数据点的准确性,并且还将增加处理时间。因此,将特征选择应用于异常检测系统的预处理步骤,以便优化其分类准确性和运行时间。大多数现有特征选择方法在处理高维数据时具有限制,因为它们搜索不同的功能子集,以查找所有功能的准确表示。显然,搜索不同的功能组合是计算非常昂贵的,这使得现有的工作不高的高维数据。这里进行的工作涉及一种用于高效和准确的异常检测(UFSAD)的相似性无监督特征选择方法的设计,主要包括从没有数据类标签的高维数据的减少的代表特征的选择。由于消除了冗余和非代表性特征,所选功能应提高异常检测系统的准确性和性能。所提出的UFSAD方法扩展了k均值聚类算法,将特征分配到基于相似度量(例如PCC - Pearson相关系数,LSRE - 最小二乘回归误差或MICI - 最大信息压缩索引)以便准确地分区特点。然后使用所提出的基于质心的特征选择方法,其中选择与其群集质心相似的特征作为代表特征,而其他则被丢弃。广泛的实验工作表明,UFSAD可以产生减少的代表和非冗余特征集,与众所周知的无监督特征选择方法相比,实现了良好的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号