首页> 外文会议>Annual International Conference on Privacy, Security and Trust >Data preprocessing for distance-based unsupervised Intrusion Detection
【24h】

Data preprocessing for distance-based unsupervised Intrusion Detection

机译:基于距离的无监督入侵检测数据预处理

获取原文

摘要

Since Intrusion Detection Systems (IDSs) operate in real-time, they should be light-weighted to detect intrusions as fast as possible. Distance-based Outlier Detection (DBOD) is one of the most widely-used techniques for detecting outliers due to its simplicity and efficiency. Additionally, DBOD is an unsupervised approach which overcomes the problem of the lack of training datasets with known intrusions. However, since IDSs usually have high-dimensional datasets, using DBOD becomes subject to the curse of the dimensionality problem. Furthermore, intrusion datasets should be normalized before calculating pair-wise distance between observations. The purpose of this research is conduct a comparative study among different normalization methods in conjunction with a well-known feature extraction technique; Principle Component Analysis (PCA). Therefore, the efficiency of these methods as data preprocessing techniques can be investigated when applying DBOD to detect intrusions. Experiments were performed using two kinds of distance metrics; Euclidean distance and Mahalanobis distance. We further examined the PCA using 7 threshold values to indicate the number of Principle components to consider according to their total contribution in the variability of features. These approaches have been evaluated using the KDD Cup 1999 intrusion detection (KDD-Cup) dataset. The main purpose of this study is to find the best attribute normalization method along with the correct threshold value for PCA so that a fast unsupervised IDS can discover intrusions effectively. The results recommended using the Log normalization method combined the Euclidean distance while performing PCA.
机译:由于入侵检测系统(IDS)实时运行,因此它们应该重量加权以尽可能快地检测入侵。基于距离的异常检测(DBOD)是由于其简单性和效率而检测异常值最广泛使用的技术之一。此外,DBOD是一种无监督的方法,克服了具有已知入侵的训练数据集的问题。然而,由于IDS通常具有高维数据集,因此使用DBOD将受到维度问题的诅咒。此外,在计算观察之间的对距离之前,应归一化入侵数据集。该研究的目的是与众所周知的特征提取技术结合不同归一化方法的比较研究;原理分析(PCA)。因此,当应用DBOD以检测入侵时,可以研究这些方法作为数据预处理技术的效率。使用两种距离度量进行实验;欧几里德距离和马哈拉诺比斯距离。我们进一步使用7个阈值检查了PCA,以指示根据其在特征变异性的总贡献中考虑的原则组件数量。已经使用KDD杯1999年入侵检测(KDD-CUP)数据集进行了评估这些方法。本研究的主要目的是找到最佳的属性归一化方法以及PCA的正确阈值,从而快速无监督的ID可以有效地发现入侵。使用日志归一化方法建议的结果将欧几里德距离组合在执行PCA时。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号