首页> 外文期刊>BMC Bioinformatics >CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests
【24h】

CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests

机译:基于随机森林的CURE-SMOTE算法和混合算法用于特征选择和参数优化

获取原文
           

摘要

Background The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. Results We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. Conclusion The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
机译:背景技术随机森林算法是一种分类器,具有广泛的通用性,广泛的应用范围和避免过度拟合的鲁棒性。但是,随机森林仍然存在一些弊端。因此,为了提高随机森林的性能,本文旨在改善不平衡数据的处理,特征选择和参数优化。结果我们针对不平衡数据分类问题提出了CURE-SMOTE算法。在不平衡的UCI数据上进行的实验表明,与使用随机抽样,边界线SMOTE1,安全级别SMOTE进行的原始数据分类结果相比,使用代表聚类(CURE)的组合有效地增强了原始的合成少数群体过采样技术(SMOTE)算法。 C-SMOTE和k-means-SMOTE。此外,已经提出了用于特征选择和参数优化的混合RF(随机森林)算法,该算法使用最小袋装(OOB)数据误差作为目标函数。对二进制和高维数据的仿真结果表明,所提出的混合射频算法,混合遗传随机森林算法,混合粒子群随机森林算法和混合鱼群随机森林算法可以实现最小的OOB误差,并表现出最佳的推广效果。能力。结论从所提出的CURE-SMOTE算法产生的训练集更接近原始数据分布,因为它包含的噪声很小。因此,从这种可行而有效的算法中可以得到更好的分类结果。此外,混合算法的F值,G平均值,AUC和OOB分数表明,它们超过了原始RF算法的性能。因此,该混合算法提供了一种执行特征选择和参数优化的新方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号