首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Internet Traffic Classification Using Constrained Clustering
【24h】

Internet Traffic Classification Using Constrained Clustering

机译:使用约束聚类的Internet流量分类

获取原文
获取原文并翻译 | 示例

摘要

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.
机译:由于传统的基于端口和基于负载的方法越来越无效,使用机器学习技术的基于统计的Internet流量分类近来引起了广泛的研究兴趣。特别是,无监督学习(即流量聚类)在现实应用中非常重要,在现实应用中,难以获得标记的训练数据,并且新模式不断涌现。尽管先前的研究已将一些经典的聚类算法(例如K-Means和EM)应用于该任务,但最终的流量聚类的质量仍远远不能令人满意。为了提高流量聚类的准确性,我们提出了一种受约束的聚类方案,该方案除了要观察到的流量统计信息之外,还要考虑一些背景信息来做出决策。具体来说,我们利用等价集约束来指示特定的流集正在使用相同的应用程序层协议,这可以根据TCP / IP网络的背景知识从数据包头中有效地推断出来。我们使用高斯混合密度对观测数据和约束条件进行建模,并对模型参数的最大似然估计采用近似算法。此外,我们使用基本分类方法研究了无监督特征离散化对交通聚类的影响。我们的评估中使用了许多现实世界中的Internet流量跟踪,结果表明,该方法不仅从总体准确性和每个类指标方面提高了流量群集的质量,而且还加快了收敛速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号