首页> 外文期刊>Intelligent data analysis >Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set
【24h】

Why machine learning algorithms fail in misuse detection on KDD intrusion detection data set

机译:为什么机器学习算法无法对KDD入侵检测数据集进行滥用检测

获取原文
获取原文并翻译 | 示例

摘要

A large set of machine learning and pattern classification algorithms trained and tested on KDD intrusion detection data set failed to identify most of the user-to-root and remote-to-local attacks, as reported by many researchers in the literature. In light of this observation, this paper aims to expose the deficiencies and limitations of the KDD data set to argue that this data set should not be used to train pattern recognition or machine learning algorithms for misuse detection for these two attack categories. Multiple analysis techniques are employed to demonstrate, both objectively and subjectively, that the KDD training and testing data subsets represent dissimilar target hypotheses for user-to-root and remote-to-local attack categories. These techniques consisted of switching the roles of original training and testing data subsets to develop a decision tree classifier, cross-validation on merged training and testing data subsets, and qualitative and comparative analysis of rules generated independently on training and testing data subsets through the C4.5 decision tree algorithm. Analysis results clearly suggest that no pattern classification or machine learning algorithm can be trained successfully with the KDD data set to perform misuse detection for user-to-root or remote-to-local attack categories. It is further noted that the analysis techniques employed to assess the similarity between the two target hypotheses represented by the training and the testing data subsets can readily be generalized to data set pairs in other problem domains.
机译:正如许多研究人员所报道的那样,在KDD入侵检测数据集上经过训练和测试的大量机器学习和模式分类算法无法识别大多数用户到root和远程到本地的攻击。根据这一观察,本文旨在揭示KDD数据集的不足和局限性,以主张不应将该数据集用于训练模式识别或机器学习算法来针对这两种攻击类别进行滥用检测。多种分析技术被用来客观地和主观地证明KDD训练和测试数据子集代表了针对用户到root和远程到本地攻击类别的不同目标假设。这些技术包括切换原始训练和测试数据子集的角色以开发决策树分类器,对合并的训练和测试数据子集进行交叉验证,以及对通过C4通过训练和测试数据子集独立生成的规则进行定性和比较分析.5决策树算法。分析结果清楚地表明,无法使用KDD数据集成功训练任何模式分类或机器学习算法,以对用户到root或远程到本地的攻击类别执行滥用检测。还应注意,用于评估由训练和测试数据子集表示的两个目标假设之间的相似性的分析技术可以很容易地推广到其他问题域中的数据集对。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号