首页> 外文会议>International Conference on Information and Communication Technologies for Disaster Management >An empirical study on optimization of training dataset in harmfulness prediction of code clone using ensemble feature selection model
【24h】

An empirical study on optimization of training dataset in harmfulness prediction of code clone using ensemble feature selection model

机译:基于集成特征选择模型的代码克隆危害性预测中训练数据集优化的实证研究

获取原文

摘要

In order to solve the problem of irrelevant features and imbalanced data classification in the process of clone code harmfulness prediction, an integrated classifier algorithm based on RUS (Random Under Sampling) and Wrapper was proposed. Firstly, the majority of samples in training dataset were re-sampled into several proportional minority class data set, which were combined with minority samples to create multiple different training sample subsets; Then, a sequential floating forward search algorithm based Wrapper was proposed to select optimal feature subsets; The different proportions of training subsets were mapped with the corresponding optimal feature subsets; Finally, random forest classifier was used to evaluate the acquired optimized training dataset. The experimental results showed that this integrated classifier algorithm applied to code clone harmfulness prediction increased average about 7% in accuracy, F1 measure and AUC evaluation index. And compared with four other similar optimization methods, the AUC value of integrated classifier algorithm was increased by 10.3%, which expressed the feasibility and effectiveness of the ensemble feature selection model.
机译:为了解决克隆代码危害性预测过程中特征不相关,数据分类不平衡的问题,提出了一种基于RUS(随机抽样)和Wrapper的集成分类器算法。首先,将训练数据集中的大部分样本重新采样为几个比例的少数族裔数据集,再与少数族裔样本相结合,以创建多个不同的训练样本子集。然后,提出了一种基于Wrapper的顺序浮动前向搜索算法来选择最优特征子集。将不同比例的训练子集与相应的最佳特征子集进行映射;最后,使用随机森林分类器评估获得的优化训练数据集。实验结果表明,该集成分类器算法应用于代码克隆危害性预测,其准确性,F1度量和AUC评估指标平均提高了约7%。与其他四种相似的优化方法相比,集成分类器算法的AUC值提高了10.3%,说明了集成特征选择模型的可行性和有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号