首页> 外文期刊>Information >A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark
【24h】

A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

机译:基于Spark的大规模网络流量分类的特征选择方法

获取原文
           

摘要

Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.
机译:当前,随着网络流量分类中数据规模的迅速增加,如何有效地选择流量特征成为一个巨大的挑战。尽管已经提出了许多使用Hadoop-MapReduce框架的传统特征选择方法,但是在处理过程中使用数字迭代计算的执行时间仍然不能令人满意。为了解决这个问题,本文提出了一种基于新型并行计算框架Spark的高效的网络流量特征选择方法。在我们的方法中,首先基于Fisher分数对完整的特征集进行预处理,然后对子集采用顺序正向搜索策略。然后,使用Spark计算框架的连续迭代来选择最佳特征子集。实例表明,在保持分类精度的前提下,我们的方法减少了建模和分类的时间成本,显着提高了特征选择的执行效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号