首页> 外文会议>International Joint Conference on Neural Networks;IJCNN 2009 >Learning on Class Imbalanced Data to Classify Peer-to-Peer Applications in IP Traffic using Resampling Techniques
【24h】

Learning on Class Imbalanced Data to Classify Peer-to-Peer Applications in IP Traffic using Resampling Techniques

机译:学习类不平衡数据以使用重采样技术对IP流量中的点对点应用进行分类

获取原文

摘要

In many applications, one class of data is presented by a large number of examples while the other only by a few. For instance, in our previous works on identification of peer-to-peer (P2P) Internet traffics, we observed that only about 30% of examples can be labeled as ldquoP2Prdquo using a port-based heuristic rule, and even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. In this paper, the effect of three resampling techniques on balancing the class distribution in training C4.5 and neural networks for identifying P2P traffic is studied. The experimental data were captured at our campus gateway. Nine datasets with different percentages of ldquoP2Prdquo examples and six datasets of different sizes with an actual percentage of about 30% of ldquoP2Prdquo examples are used in the experiments. The results show that resampling techniques are effective and stable, and random over-sampling is a quite good choice for P2P traffic identification considering a combination of the classification performance and time complexity.
机译:在许多应用程序中,一类数据由大量示例呈现,而另一类仅由少数几个示例呈现。例如,在我们以前的对等(P2P)Internet流量识别中,我们观察到只有大约30%的示例可以使用基于端口的启发式规则标记为ldquoP2Prdquo,而更少的示例可以标记为将来,随着越来越多的P2P应用程序使用动态端口。本文研究了三种重采样技术对平衡C4.5训练中的类分布和神经网络识别P2P流量的影响。实验数据是在我们的校园网关中捕获的。实验中使用了9个具有不同百分比的ldquoP2Prdquo示例的数据集和6个不同大小的数据集,其中实际百分比约为ldquoP2Prdquo示例的30%。结果表明,重采样技术是有效且稳定的,考虑到分类性能和时间复杂度,随机过采样是用于P2P流量识别的一个很好的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号