首页> 外文期刊>Knowledge-Based Systems >Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps
【24h】

Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps

机译:多目标优化的特征选择:通过分层自组织映射在网络异常检测中的应用

获取原文
获取原文并翻译 | 示例

摘要

Feature selection is an important and active issue in clustering and classification problems. By choosing an adequate feature subset, a dataset dimensionality reduction is allowed, thus contributing to decreasing the classification computational complexity, and to improving the classifier performance by avoiding redundant or irrelevant features. Although feature selection can be formally defined as an optimisation problem with only one objective, that is, the classification accuracy obtained by using the selected feature subset, in recent years, some multi-objective approaches to this problem have been proposed. These either select features that not only improve the classification accuracy, but also the generalisation capability in case of supervised classifiers, or counterbalance the bias toward lower or higher numbers of features that present some methods used to validate the clustering/classification in case of unsupervised classifiers. The main contribution of this paper is a multi-objective approach for feature selection and its application to an unsupervised clustering procedure based on Growing Hierarchical Self-Organising Maps (GHSOMs) that includes a new method for unit labelling and efficient determination of the winning unit. In the network anomaly detection problem here considered, this multi-objective approach makes it possible not only to differentiate between normal and anomalous traffic but also among different anomalies. The efficiency of our proposals has been evaluated by using the well-known DARPA/NSL-KDD datasets that contain extracted features and labelled attacks from around 2 million connections. The selected feature sets computed in our experiments provide detection rates up to 99.8% with normal traffic and up to 99.6% with anomalous traffic, as well as accuracy values up to 99.12%.
机译:特征选择是聚类和分类问题中一个重要而活跃的问题。通过选择适当的特征子集,可以减少数据集的维数,从而有助于降低分类的计算复杂度,并通过避免冗余或不相关的特征来提高分类器的性能。尽管可以将特征选择正式定义为仅具有一个目标的优化问题,即通过使用所选特征子集获得的分类精度,但是近年来,已经提出了一些针对该问题的多目标方法。这些选择的功能不仅可以提高分类的准确性,而且可以在监督分类器的情况下提高泛化能力,或者抵消对特征数量较少或较高的偏见,这些特征在无监督分类器的情况下提供了一些用于验证聚类/分类的方法。本文的主要贡献是一种多目标特征选择方法,并将其应用于基于增长层次自组织映射(GHSOM)的无监督聚类过程,该过程包括一种新的单位标注方法和获胜单位的有效确定方法。在这里考虑的网络异常检测问题中,这种多目标方法不仅可以区分正常流量和异常流量,还可以区分不同的异常。我们的建议的效率已经通过使用著名的DARPA / NSL-KDD数据集进行了评估,该数据集包含提取的特征和来自大约200万个连接的标记攻击。在我们的实验中计算出的选定特征集在正常流量下的检测率高达99.8%,在异常流量下的检测率高达99.6%,准确度值高达99.12%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号