A study on combining dynamic selection and data preprocessing for imbalance learning

Roy Anandarup; Cruz Rafael M. O.; Sabourin Robert; Cavalcanti George D. C.

首页> 外文期刊>Neurocomputing >A study on combining dynamic selection and data preprocessing for imbalance learning

【24h】

A study on combining dynamic selection and data preprocessing for imbalance learning

机译：动态选择与数据预处理相结合的不平衡学习研究

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In real life, classifier learning may encounter a dataset in which the number of instances of a given class is much higher than for other classes. Such imbalanced datasets require special attention because traditional classifiers generally favor the majority class which has a large number of instances. Ensemble classifiers, in such cases, have been reported to yield promising results. Most often, ensembles are specially designed for data level preprocessing techniques that aim to balance class proportions by applying under-sampling and/or over-sampling. Most available studies concentrate on static ensembles designed for different preprocessing techniques. Contrary to static ensembles, dynamic ensembles became popular thanks to their performance in the context of ill defined problems (small size datasets). A dynamic ensemble includes a dynamic selection module for choosing the best ensemble given a test instance. This paper experimentally evaluates the argument that dynamic selection combined with a preprocessing technique can achieve higher performance than static ensemble for imbalanced classification problems. For this evaluation, we collect 84 two-class and 26 multi-class datasets of varying degrees of class-imbalance. In addition, we consider five variations of preprocessing methods and four dynamic selection methods. We further design a useful experimental framework to integrate preprocessing and dynamic selection. Our experiments show that the dynamic ensemble improves the F-measure and the G-mean as compared to the static ensemble. Moreover, considering different levels of imbalance, dynamic selection methods secure higher ranks than other alternatives. (c) 2018 Elsevier B.V. All rights reserved.

机译：在现实生活中，分类器学习可能会遇到一个数据集，其中给定类的实例数比其他类的实例数高得多。这种不平衡的数据集需要特别注意，因为传统的分类器通常偏向具有大量实例的多数类。据报道，在这种情况下，整体分类器产生了可喜的结果。通常，合奏是专门为数据级预处理技术而设计的，该技术旨在通过应用欠采样和/或过采样来平衡类比例。大多数可用的研究都集中在针对不同预处理技术设计的静态合奏上。与静态合奏相反，动态合奏由于其在未定义问题（小数据集）的情况下的性能而变得流行。动态合奏包括动态选择模块，用于在给定测试实例的情况下选择最佳合奏。本文通过实验评估了以下观点：对于不平衡分类问题，动态选择与预处理技术相结合可以获得比静态集成更高的性能。为了进行此评估，我们收集了84个级别不同的不平衡程度的两类和26种多类数据集。另外，我们考虑了预处理方法的五个变体和四个动态选择方法。我们进一步设计了一个有用的实验框架，以整合预处理和动态选择。我们的实验表明，与静态乐团相比，动态乐团提高了F测度和G均值。此外，考虑到不同程度的失衡，动态选择方法可确保比其他选择更高的等级。（c）2018 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2018年第19期|179-192|共14页
作者
Roy Anandarup; Cruz Rafael M. O.; Sabourin Robert; Cavalcanti George D. C.;
展开▼
作者单位

Univ Quebec, Ecole Technol Super, Montreal, PQ, Canada;

Univ Quebec, Ecole Technol Super, Montreal, PQ, Canada;

Univ Quebec, Ecole Technol Super, Montreal, PQ, Canada;

Univ Fed Pernambuco, Ctr Informat, Recife, PE, Brazil;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Imbalanced learning; Ensemble of classifiers; Multi-class imbalance; Dynamic ensemble selection; Preprocessing; SMOTE;

机译：学习不平衡;分类器集合;多类不平衡;动态集成选择;预处理;SMOTE;

相似文献

外文文献
中文文献
专利

1. Dynamic Ensemble Selection and Data Preprocessing for Multi-Class Imbalance Learning [J] . Cruz Rafael M. O., Souza Mariana de Araujo, Sabourin Robert, International Journal of Pattern Recognition and Artificial Intelligence . 2019,第11期

机译：动态集成选择和数据预处理，用于多类不平衡学习
2. Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection [J] . Verbiest Nele, Ramentol Enislay, Cornelis Chris, Applied Soft Computing . 2014,第Null期

机译：使用模糊粗糙原型选择增强的SMOTE预处理嘈杂的不平衡数据集
3. COMBINING ADABOOST WITH PREPROCESSING ALGORITHMS FOR EXTRACTING FUZZY RULES FROM LOW QUALITY DATA IN POSSIBLY IMBALANCED PROBLEMS [J] . ANA PALACIOS, LUCIANO SANCHEZ, INES COUSO International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems . 2012,第Suppla2期

机译：将ADABOOST与预处理算法相结合，以从可能存在的不平衡问题中从低质量数据中提取模糊规则
4. Preprocessing of imbalanced breast cancer data using feature selection combined with over-sampling technique for classification [C] . Jojan Janjira, Srivihok Anongnart International Conference on Advanced Computer Science and Information Systems . 2013

机译：使用特征选择和过采样技术对不平衡乳腺癌数据进行预处理以进行分类
5. Reliability Improvement on Feasibility Study for Selection of Infrastructure Projects Using Data Mining and Machine Learning [D] . Hu, Xi. 2020

机译：利用数据挖掘和机器学习选择基础设施项目的可行性研究的可靠性改进
6. Smartwatch-Based Eating Detection: Data Selection for Machine Learning from Imbalanced Data with Imperfect Labels [O] . Simon Stankoski, Marko Jordan, Hristijan Gjoreski, 2021

机译：基于SmartWatch的进食检测：从具有不完美标签的存储数据的机器学习的数据选择
7. Dynamic Ensemble Selection and Data Preprocessing for Multi-Class Imbalance Learning [O] . Rafael M. O. Cruz, Mariana A. Souza, Robert Sabourin, 2019

机译：多级不平衡学习的动态集合选择和数据预处理

A study on combining dynamic selection and data preprocessing for imbalance learning

摘要

著录项

相似文献

相关主题

期刊订阅