An Unsupervised Feature Selection Method for Data-Driven Anomaly Detection Systems

机译：一种无监督的数据驱动异常检测系统的特征选择方法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Feature selection has been widely used as a pre-processing step that helps to optimise the performance of data-driven intrusion/anomaly detection systems in achieving their tasks. For example, when grouping the data into normal and outlier groups, the existence of redundant and non-representative features would reduce the accuracy of classifying the data points and would also increase the processing time. Therefore, feature selection is applied as a pre-processing step for anomaly detection systems in order to optimize their classification accuracy and running time. Most of the existing feature selection methods have limitations when dealing with high-dimensional data, as they search different subsets of features to find accurate representations of all features. Obviously, searching for different combinations of features is computationally very expensive, which makes existing work not efficient for high-dimensional data. The work carried out here, which relates to the design of a similaritybased unsupervised feature selection method for an efficient and accurate anomaly detection (UFSAD), tackles mainly the selection of reduced set of representative features from high-dimensional data without the data class labels. The selected features should improve the accuracy and performance of anomaly detection systems due to the elimination of redundant and non-representative features. The proposed UFSAD method extends the k-mean clustering algorithm to partition the features into k clusters based on a similarity measure (e.g. PCC - Pearson Correlation Coefficient, LSRE - Least Square Regression Error or MICI - Maximal Information Compression Index) in order to accurately partition the features. Then the proposed centroid-based feature selection method is used, where the feature with the closest similarity to its cluster centroid is selected as the representative feature while others are discarded. Extensive experimental work has shown that UFSAD can generate a reduced representative and non-redundant feature set that achieves good classification accuracy in comparison with well-known unsupervised features selection methods.

机译：特征选择已被广泛用作预处理步骤，有助于优化数据驱动入侵/异常检测系统在实现其任务方面的性能。例如，在将数据分组到正常和异常组时，冗余和非代表特征的存在将降低分类数据点的准确性，并且还将增加处理时间。因此，将特征选择应用于异常检测系统的预处理步骤，以便优化其分类准确性和运行时间。大多数现有特征选择方法在处理高维数据时具有限制，因为它们搜索不同的功能子集，以查找所有功能的准确表示。显然，搜索不同的功能组合是计算非常昂贵的，这使得现有的工作不高的高维数据。这里进行的工作涉及一种用于高效和准确的异常检测（UFSAD）的相似性无监督特征选择方法的设计，主要包括从没有数据类标签的高维数据的减少的代表特征的选择。由于消除了冗余和非代表性特征，所选功能应提高异常检测系统的准确性和性能。所提出的UFSAD方法扩展了k均值聚类算法，将特征分配到基于相似度量（例如PCC - Pearson相关系数，LSRE - 最小二乘回归误差或MICI - 最大信息压缩索引）以便准确地分区特点。然后使用所提出的基于质心的特征选择方法，其中选择与其群集质心相似的特征作为代表特征，而其他则被丢弃。广泛的实验工作表明，UFSAD可以产生减少的代表和非冗余特征集，与众所周知的无监督特征选择方法相比，实现了良好的分类精度。

著录项

来源
《International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises》|2020年|36-41|共6页
会议地点
作者
Naif Almusallam;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Search problems; Partitioning algorithms; Indexes; Time complexity; Task analysis; Anomaly detection;

机译：特征提取;搜索问题;分区算法;索引;时间复杂性;任务分析;异常检测;

相似文献

外文文献
中文文献
专利

1. Spectral ranking and unsupervised feature selection for point, collective, and contextual anomaly detection [J] . Haofan Zhang, Ke Nian, Thomas F. Coleman, International Journal of Data Science and Analytics . 2020,第1期

机译：光谱分级和无监督特征选择，可用于点，集合和上下文异常检测
2. Data-driven Anomaly Detection with Timing Features for Embedded Systems [J] . Lu Sixing, Lysecky Roman ACM Transactions on Design Automation of Electronic Systems . 2019,第3期

机译：数据驱动的异常检测，具有嵌入式系统的定时功能
3. A novel unsupervised method for anomaly detection in time series based on statistical features for industrial predictive maintenance [J] . Jesimar da Silva Arantes, Marcio da Silva Arantes, Herberth Birck Froehlich, International Journal of Data Science and Analytics . 2021,第4期

机译：基于工业预测维护的统计特征的时间序列中的异常检测的一种新型无调节方法
4. Network Anomaly Detection Using Unsupervised Feature Selection and Density Peak Clustering [C] . Xiejun Ni, Daojing He, Sammy Chan, Interantioanl conference on applied cryptography and network security . 2016

机译：使用无监督特征选择和密度峰值聚类的网络异常检测
5. Unsupervised data mining methods for functional data analysis and feature selection. [D] . Rattakorn, Panaya. 2009

机译：用于功能数据分析和特征选择的无监督数据挖掘方法。
6. Ischemic Stroke Detection System with a Computer-Aided Diagnostic Ability Using an Unsupervised Feature Perception Enhancement Method [O] . Yeu-Sheng Tyan, Ming-Chi Wu, Chiun-Li Chin, 2014

机译：具有无辅助特征感知增强方法的计算机辅助诊断能力的缺血性卒中检测系统
7. Design and performance analysis of various feature selection methods for anomaly-based techniques in intrusion detection system [O] . Sushant Kumar Pandey 2019

机译：基于异常基础技术的各种特征选择方法的设计与性能分析
8. Improved Feature Extraction, Feature Selection, and Identification Techniques That Create a Fast Unsupervised Hyperspectral Target Detection Algorithm [R] . Johnson, R. J. 2008

机译：改进的特征提取，特征选择和识别技术，创建快速无监督的高光谱目标检测算法

An Unsupervised Feature Selection Method for Data-Driven Anomaly Detection Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅