Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers

Roozbeh Zarei; Alireza Monemi; Muhammad Nadzir Marsono

首页> 外文期刊>Journal of network and systems management >Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers

【24h】

Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers

机译：自动数据集生成，用于训练对等机器学习分类器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Peer-to-peer (P2P) classifications based on flow statistics have been proven accurate in detecting P2P traffic. A machine learning classification is affected by the quality and recency of the training dataset used. Hence, to classify P2P traffic on-line requires the removal of these limitations. In this paper, an automated training dataset generation for an on-line P2P traffic classification is proposed to allow frequent classifier retraining. A two-stage training dataset generator (TSTDG) is proposed by combining a 3-class heuristic and a 3-class statistical classification to automatically generate a training dataset. In the heuristic stage, traffic is classified as P2P, non-P2P, or unknown. In the statistical stage, a dual Decision Tree is built based on a dataset generated in the heuristic stage to reduce the amount of classified unknown traffic. The final training dataset is generated based on all flows that are classified in these two stages. The proposed system has been evaluated on traces captured from a campus network. The overall results show that the TSTDG can generate an accurate training dataset by classifying around 94 % of total flows with high accuracy (98.59 %) and a low false positive rate (1.27 %).

机译：基于流量统计的对等（P2P）分类已被证明在检测P2P流量方面是准确的。机器学习分类受所用训练数据集的质量和新近度的影响。因此，在线分类P2P流量需要消除这些限制。在本文中，提出了一种用于在线P2P流量分类的自动训练数据集生成，以允许频繁的分类器再训练。通过结合3类启发式和3类统计分类以自动生成训练数据集，提出了两阶段训练数据集生成器（TSTDG）。在启发式阶段，流量分为P2P，非P2P或未知。在统计阶段，基于启发式阶段生成的数据集构建双重决策树，以减少分类的未知流量。根据在这两个阶段中分类的所有流，生成最终的训练数据集。拟议的系统已经从校园网络捕获的痕迹进行了评估。总体结果表明，TSDTG可以通过对94％的总流量进行分类，从而以较高的准确性（98.59％）和较低的误报率（1.27％）来生成准确的训练数据集。

著录项

来源
《Journal of network and systems management》 |2015年第1期|89-110|共22页
作者
Roozbeh Zarei; Alireza Monemi; Muhammad Nadzir Marsono;
展开▼
作者单位

Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Johor Bahru, Malaysia,Centre for Applied Informatics, College of Engineering and Science, Victoria University, PO Box 14428, Melbourne, VIC 8001, Australia;

Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Johor Bahru, Malaysia;

Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 Johor Bahru, Malaysia;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Traffic classification; Peer-to-peer traffic; Machine learning; Training dataset; Two-stage classifier;

机译：交通分类;点对点流量;机器学习;训练数据集;两阶段分类器;

相似文献

外文文献
中文文献
专利

1. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier [J] . C. V.Subbulakshmi, S. N.Deepa ScientificWorldJournal . 2015,第3期

机译：医疗数据集分类：通过极端学习机分类器集成粒子群优化的机器学习范式
2. The Maximum Vector-Angular Margin Classifier and its fast training on large datasets using a core vector machine [J] . Hu Wenjun, Chung Fu-iai, Wang Shitong Neural Networks: The Official Journal of the International Neural Network Society . 2012,第Mara期

机译：最大矢量角余量分类器及其使用核心矢量机的大型数据集快速训练
3. The Maximum Vector-Angular Margin Classifier and its fast training on large datasets using a core vector machine [J] . Hu Wenjun, Chung Fu-iai, Wang Shitong Neural Networks: The Official Journal of the International Neural Network Society . 2012,第Mara期

机译：使用核心矢量机器的最大矢量角裕度分类器及其在大型数据集上的快速训练
4. Generating genetic engineering linked indicator datasets for machine learning classifier training in biosecurity [C] . Christopher Painter, Nathaniel D. Bastian Conference on Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications . 2021

机译：生成基因工程连接指示器数据集，用于生物安全的机器学习分类器培训
5. Machine Learning for the Analysis of Power System Loads: Cyber-Attack Detection and Generation of Synthetic Datasets [D] . Pinceti, Andrea. 2021

机译：电力系统负载分析的机器学习：网络攻击检测和合成数据集的产生
6. Retracted: Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier [O] . The Scientific World Journal 2016

机译：缩回：医学数据集分类：结合粒子群优化与极限学习机分类器的机器学习范例
7. Retracted: Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier [O] . 2016

机译：缩回：医疗数据集分类：通过极端学习机分类器整合粒子群优化的机器学习范式

Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers

摘要

著录项

相似文献

相关主题

期刊订阅