首页> 外文会议>Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on >Feature Selection Metric Using AUC Margin for Small Samples and Imbalanced Data Classification Problems
【24h】

Feature Selection Metric Using AUC Margin for Small Samples and Imbalanced Data Classification Problems

机译:小样本和不平衡数据分类问题的使用AUC余量的特征选择指标

获取原文

摘要

Feature selection helps us to address problems possessing high dimensionality, retaining only those features that are most important for the classification task. However, traditional feature selection methods fail to account for imbalanced class distributions, leading to poor predictions for minority class samples. Recently, there has been a growing interest around the Area Under ROC curve (AUC) metric due to the fact that it can provide meaningful performance measures in the presence of imbalanced data. In this paper, we propose a new margin-based feature selection metric that defines the quality of a set of features by considering the maximized AUC margin it induces during the process of learning with boosting. Our algorithm measures the cumulative effect each feature has on the margin distribution associated with the weighted linear combination that boosting produces over the positive and the negative examples. Experiments on various real imbalanced data sets show the effectiveness of our algorithm when faced with selecting informative features from small data possessing skewed class distributions.
机译:特征选择有助于我们解决高维问题,仅保留那些对分类任务最重要的特征。但是,传统的特征选择方法无法解决类别分布不平衡的问题,从而导致对少数类别样本的预测不佳。最近,由于ROC曲线下面积(AUC)度量标准在存在不平衡数据的情况下可以提供有意义的性能指标,因此引起了越来越多的关注。在本文中,我们提出了一种新的基于余量的特征选择度量,该度量通过考虑在增强学习过程中诱发的最大AUC余量来定义一组特征的质量。我们的算法测量每个特征对与加权线性组合相关联的余量分布的累积影响,该加权线性组合在正例和负例上产生了增强。在各种实际不平衡数据集上进行的实验表明,当我们从具有倾斜类分布的小型数据中选择信息特征时,我们的算法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号