首页> 外文期刊>Journal of network and systems management >Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers
【24h】

Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers

机译:自动数据集生成,用于训练对等机器学习分类器

获取原文
获取原文并翻译 | 示例
           

摘要

Peer-to-peer (P2P) classifications based on flow statistics have been proven accurate in detecting P2P traffic. A machine learning classification is affected by the quality and recency of the training dataset used. Hence, to classify P2P traffic on-line requires the removal of these limitations. In this paper, an automated training dataset generation for an on-line P2P traffic classification is proposed to allow frequent classifier retraining. A two-stage training dataset generator (TSTDG) is proposed by combining a 3-class heuristic and a 3-class statistical classification to automatically generate a training dataset. In the heuristic stage, traffic is classified as P2P, non-P2P, or unknown. In the statistical stage, a dual Decision Tree is built based on a dataset generated in the heuristic stage to reduce the amount of classified unknown traffic. The final training dataset is generated based on all flows that are classified in these two stages. The proposed system has been evaluated on traces captured from a campus network. The overall results show that the TSTDG can generate an accurate training dataset by classifying around 94 % of total flows with high accuracy (98.59 %) and a low false positive rate (1.27 %).
机译:基于流量统计的对等(P2P)分类已被证明在检测P2P流量方面是准确的。机器学习分类受所用训练数据集的质量和新近度的影响。因此,在线分类P2P流量需要消除这些限制。在本文中,提出了一种用于在线P2P流量分类的自动训练数据集生成,以允许频繁的分类器再训练。通过结合3类启发式和3类统计分类以自动生成训练数据集,提出了两阶段训练数据集生成器(TSTDG)。在启发式阶段,流量分为P2P,非P2P或未知。在统计阶段,基于启发式阶段生成的数据集构建双重决策树,以减少分类的未知流量。根据在这两个阶段中分类的所有流,生成最终的训练数据集。拟议的系统已经从校园网络捕获的痕迹进行了评估。总体结果表明,TSDTG可以通过对94%的总流量进行分类,从而以较高的准确性(98.59%)和较低的误报率(1.27%)来生成准确的训练数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号