首页> 外文期刊>Soft Computing - A Fusion of Foundations, Methodologies and Applications >Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
【24h】

Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling

机译:解决不平衡数据集的数据复杂性:基于SMOTE的过采样和进化欠采样的分析

获取原文
获取原文并翻译 | 示例

摘要

In the classification framework there are problems in which the number of examples per class is not equitably distributed, formerly known as imbalanced data sets. This situation is a handicap when trying to identify the minority classes, as the learning algorithms are not usually adapted to such characteristics. An usual approach to deal with the problem of imbalanced data sets is the use of a preprocessing step. In this paper we analyze the usefulness of the data complexity measures in order to evaluate the behavior of undersampling and oversampling methods. Two classical learning methods, C4.5 and PART, are considered over a wide range of imbalanced data sets built from real data. Specifically, oversampling techniques and an evolutionary undersampling one have been selected for the study. We extract behavior patterns from the results in the data complexity space defined by the measures, coding them as intervals. Then, we derive rules from the intervals that describe both good or bad behaviors of C4.5 and PART for the different preprocessing approaches, thus obtaining a complete characterization of the data sets and the differences between the oversampling and undersampling results.
机译:在分类框架中,存在以下问题:每个类的示例数量不均等分布,以前称为不平衡数据集。当试图识别少数群体时,这种情况是一个障碍,因为学习算法通常不适合这种特征。解决数据集不平衡问题的常用方法是使用预处理步骤。在本文中,我们分析了数据复杂性度量的有用性,以评估欠采样和过采样方法的行为。在从真实数据构建的各种不平衡数据集中,考虑了两种经典的学习方法C4.5和PART。具体而言,已为研究选择过采样技术和进化欠采样技术。我们从度量定义的数据复杂性空间中的结果中提取行为模式,并将其编码为间隔。然后,我们从描述不同预处理方法的C4.5和PART的好坏行为的区间中得出规则,从而获得数据集的完整特征以及过采样和欠采样结果之间的差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号