首页> 外文期刊>Applied Soft Computing >Automatically building datasets of labeled IP traffic traces: A self-training approach
【24h】

Automatically building datasets of labeled IP traffic traces: A self-training approach

机译:自动构建带有标签的IP流量跟踪的数据集:一种自我训练方法

获取原文
获取原文并翻译 | 示例
       

摘要

Many approaches have been proposed so far to tackle computer network security. Among them, several systems exploit Machine Learning and Pattern Recognition techniques, by regarding malicious behavior detection as a classification problem. Supervised and unsupervised algorithms have been used in this context, each one with its own benefits and shortcomings. When using supervised techniques, a representative training set is required, which reliably indicates what a human expert wants the system to learn and recognize, by means of suitably labeled samples. In real environments there is a significant difficulty in collecting a representative dataset of correctly labeled traffic traces. In adversarial environments such a task is made even harder by malicious attackers, trying to make their actions' evidences stealthy. In order to overcome this problem, a self-training system is presented in this paper, building a dataset of labeled network traffic based on raw tcpdump traces and no prior knowledge on data. Results on both emulated and real traffic traces have shown that intrusion detection systems trained on such a dataset perform as well as the same systems trained on correctly hand-labeled data.
机译:迄今为止,已经提出了许多解决计算机网络安全性的方法。其中,有几种系统通过将恶意行为检测视为分类问题来利用机器学习和模式识别技术。在这种情况下,使用了监督和无监督算法,每种算法都有其自身的优点和缺点。当使用监督技术时,需要一个有代表性的训练集,该训练集通过适当标记的样本可靠地指示人类专家希望系统学习和识别的内容。在实际环境中,收集具有正确标记的流量跟踪的代表性数据集存在很大困难。在对抗性环境中,恶意攻击者将这种任务变得更加艰巨,试图使他们的行动证据不显眼。为了克服这个问题,本文提出了一种自训练系统,该系统基于原始tcpdump跟踪建立了标记网络流量的数据集,并且没有数据的先验知识。在模拟和实际流量跟踪上的结果都表明,在这样的数据集上训练的入侵检测系统的性能与在正确的手工标记数据上训练的相同系统相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号