首页> 外文会议>International conference on advanced data mining and applications;ADMA 2011 >An Empirical Evaluation of Bagging with Different Algorithms on Imbalanced Data
【24h】

An Empirical Evaluation of Bagging with Different Algorithms on Imbalanced Data

机译:基于不平衡数据的不同算法套袋的实证评估

获取原文

摘要

This study investigates the effectiveness of bagging with respect to different learning algorithms on Imbalanced data-sets. The purpose of this research is to investigate the performance of bagging based on two unique approaches: (1) classify base learners with respect to 12 different learning algorithms in general terms, and (2) evaluate the performance of bagging predictors on data with imbalanced class distributions. The former approach develops a method to categorize base learners by using two-dimensional robustness and stability decomposition on 48 benchmark data-sets; while the latter approach investigates the performance of bagging predictors by using evaluation metrics, True Positive Rate (TPR), Geometric mean (G-mean) for the accuracy on the majority and minority classes, and the Receiver Operating Characteristic (ROC) curve on 12 imbalanced data-sets. Our studies assert that both stability and robustness are important factors for building high performance bagging predictors on data with imbalanced class distributions. The experimental results demonstrated that PART and Multi-layer Proceptron (MLP) are the learning algorithms with the best bagging performance on 12 imbalanced data-sets. Moreover, only four out of 12 bagging predictors are statistically superior to single learners based on both G-mean and TPR evaluation metrics over 12 imbalanced data-sets.
机译:这项研究调查了关于不平衡数据集上不同学习算法的装袋效果。这项研究的目的是基于两种独特的方法来调查装袋的性能:(1)针对12种不同的学习算法,对基础学习者进行一般性分类;(2)在不平衡类的数据上评估装袋预测器的性能分布。前一种方法是通过对48个基准数据集进行二维鲁棒性和稳定性分解,开发出一种对基础学习者进行分类的方法。而后一种方法则通过使用评估指标,真实肯定率(TPR),几何平均值(G-mean)来确定大多数和少数族裔类别的准确性以及使用12的接收器运行特征(ROC)曲线来调查装袋预测器的性能。不平衡的数据集。我们的研究认为,稳定性和鲁棒性都是在类分布不平衡的数据上构建高性能袋装预测变量的重要因素。实验结果表明,PART和多层Proceptron(MLP)是在12个不平衡数据集上具有最佳装袋性能的学习算法。此外,基于G均值和TPR评估指标,在12个不平衡数据集上,只有12个装袋预测器中有4个在统计上优于单个学习器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号