首页> 外文会议>International Conference on Network and System Security >A novel semi-supervised approach for network traffic clustering
【24h】

A novel semi-supervised approach for network traffic clustering

机译:一种新的网络流量聚类方法

获取原文

摘要

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.
机译:网络流量分类是网络管理和安全系统的重要组成部分。为解决传统港口和基于有效载荷的方法的局限性,最近的研究一直在关注替代方法。一个有希望的方向是应用机器学习技术,根据数据包和流量级别来分类流量流量。特别是,之前的论文说明了聚类可以实现高精度并发现未知的应用程序类。在这项工作中,我们介绍了一种使用约束聚类算法的新型半监督学习方法。动机是,在网络域中,除了数据实例本身之外,还可以使用许多背景信息。例如,我们可能知道流程ƒ 1 和ƒ 2 使用相同的应用程序协议,因为它们同时访问同一端口处的相同主机地址。在这种情况下,ƒ 1 和ƒ 2 理想地将其分组为同一集群。因此,我们以配对的必要条件约束的形式描述这些相关性并将它们包含在聚类过程中。我们已经应用了K-Means算法的三个受限变体,从而从约束执行硬或软限制满意度和度量学习。已经使用了许多现实世界流量迹线来显示限制的可用性并测试所提出的方法。实验结果表明,通过在聚类过程中纳入约束,可以显着提高整体精度和簇纯度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号