首页> 外文期刊>Peer-to-peer networking and applications >Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method - Springer
【24h】

Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method - Springer

机译:使用增量三训练方法利用未标记的数据来改善对等流量分类-Springer

获取原文
获取原文并翻译 | 示例

摘要

Unlabeled training examples are readily available in many applications, but labeled examples are fairly expensive to obtain. For instance, in our previous works on classification of peer-to-peer (P2P) Internet traffics, we observed that only about 25% of examples can be labeled as “P2P”or “NonP2P” using a port-based heuristic rule. We also expect that even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. This fact motivates us to investigate the techniques which enhance the accuracy of P2P traffic classification by exploiting the unlabeled examples. In addition, the Internet data flows dynamically in large volumes (streaming data). In P2P applications, new communities of peers often join and old communities of peers often leave, requiring the classifiers to be capable of updating the model incrementally, and dealing with concept drift. Based on these requirements, this paper proposes an incremental Tri-Training (iTT) algorithm. We tested our approach on a real data stream with 7.2 Mega labeled examples and 20.4 Mega unlabeled examples. The results show that iTT algorithm can enhance accuracy of P2P traffic classification by exploiting unlabeled examples. In addition, it can effectively deal with dynamic nature of streaming data to detect the changes in communities of peers. We extracted attributes only from the IP layer, eliminating the privacy concern associated with the techniques that use deep packet inspection.
机译:未标记的训练示例在许多应用程序中都很容易获得,但是标记的示例获得起来相当昂贵。例如,在我们以前的对等(P2P)Internet流量分类中,我们观察到只有大约25%的示例可以使用基于端口的启发式规则标记为“ P2P”或“ NonP2P”。我们还希望随着越来越多的P2P应用程序使用动态端口,将来可以标记更少的示例。这一事实促使我们研究通过利用未标记的示例来提高P2P流量分类的准确性的技术。此外,Internet数据会大量动态地流动(流数据)。在P2P应用程序中,新的对等社区经常加入而旧的对等社区经常离开,要求分类器能够逐步更新模型并处理概念漂移。基于这些要求,本文提出了一种增量式三级训练(iTT)算法。我们使用7.2 Mega标记的示例和20.4 Mega未标记的示例在真实数据流上测试了我们的方法。结果表明,通过利用未标记的示例,iTT算法可以提高P2P流量分类的准确性。另外,它可以有效地处理流数据的动态特性,以检测对等社区的变化。我们仅从IP层提取属性,从而消除了与使用深度包检查的技术相关的隐私问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号