首页> 中文期刊> 《计算机学报》 >基于信息熵的自适应网络流概念漂移分类方法

基于信息熵的自适应网络流概念漂移分类方法

         

摘要

In recent years, traffic classification based on machine learning shows a high accuracy.Nevertheless, machine learning-based traffic classification heavily depends on the environment where the samples are trained.In practice, although a classifier can be accurately trained at a given network environment, its accuracy will see a great decline when it faces to classify traffic from varying network condition in practice.Due to dynamic changes of traffic statistics and distribution, the machine learning-based classifiers should be updated periodically in order to optimize the performance.This issue is unavoidable for machine learning-based traffic classification.The present solutions lack explicit recommendations on when a classifier should be updated and how to effectively update the classifier.These result in several shortcomings: (1) Updating a traditional traffic classifier is time consuming.It is inherent to how often a classifier should be updated or when a new classifier will be needed.(2) Updating only a new classifier on new traffic leads to some learned knowledge lost.It further affects the performance when updating a classifier on a large dataset that combines all collected data.(3) Traffic statistics and distribution from varying network condition are dynamically changed.Thus, it is hard to obtain stable feature subset to build robust classifier.Therefore, building an adaptive classifier to changing network condition is a huge challenge.In this paper, we develop an adaptive traffic classification using entropy-based detection and incremental ensemble learning, assisted with embedded feature selection.In order to update the classifier timely and effectively, the entropy-based detection utilizes sliding window technique to measure the statistical difference between the previous and current traffic samples by counting and comparing all instances with respect to their feature stream membership.Additionally, we discretize the range of feature values to a fixed number of bins to take the approximate value distribution into account.Moreover, incremental ensemble learning schema retains previous trained classifiers, and introduces the classifier retrained on current traffic and removes the classifier with performance degradation.Furthermore, several feature selectors are integrated to obtain feature subsets with robust generalization.The comprehensive performance evaluation conducted on two real-world network traffic data sets shows that our approach can effectively detect concept drift in changing network condition and update the classifier with high accuracy and generalization ability.The major contributions of this work are summarized as follows: first, this paper presents an adaptive traffic classification system based on concept drift detection.Information entropy is used to detect concept drift based on the entropy change of feature attributes.The information entropy-based detection method does not require class information of flows.Second, the classifiers are updated according to the result of concept drift detection, rather than regularly updated at a given period.Third, the method uses ensemble learning strategy to introduce classifier built on new samples, and eliminates classifiers with performance degradation in order to optimize the classification model.Fourth, mutual information is introduced to evaluate features for concept drift detection.The results show that the mutual information between packet size and protocol is high and stable, which indicates that the feature is suitable for concept drift detection.Fifth, this paper uses Hoeffding boundary to determine the window size of concept drift detection.The appropriate window size is significant for fast and effective concept drift detection.%由于网络流量特征随时间和网络环境的变化而发生改变,导致基于机器学习的流量分类方法精度明显降低.同时,根据经验定期更新分类器是耗时的,且难以保证新分类器泛化性能.因而,文中提出一种基于信息熵的自适应网络流概念漂移分类方法,首先根据特征属性的信息熵变化检测概念漂移,再采用增量集成学习策略在概念漂移点引入当前流量建立的分类器,并剔除性能下降的分类器,达到更新分类器的目的,最后加权集成分类结果.实验结果表明该方法可以有效地检测概念漂移并更新分类器,表现出较好的分类性能和泛化能力.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号