首页> 外文期刊>Expert systems with applications >DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets
【24h】

DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets

机译:Debohid:基于差分演化的高度不平衡数据集的过采样方法

获取原文
获取原文并翻译 | 示例

摘要

Class distribution of the samples in the dataset is one of the critical factors affecting the classification success. Classifiers trained with imbalanced datasets classify majority class samples more successfully than minority class samples. Oversampling, which is based on increasing the minority class samples, is a frequently used method to overcome the class imbalance. More than two decades, many oversampling methods are presented for the class imbalance problem. Differential Evolution is a metaheuristic algorithm that achieves successful results in a lot of domains. One of the main reasons for this success is that DE has an effective candidate individual generation mechanism. In this work, we propose a novel oversampling method based on a differential evolution algorithm for highly imbalanced datasets, and it is named as DEBOHID (A differential evolution based oversampling approach for highly imbalanced datasets). In order to show the success of DEBOHID, 44 highly imbalanced ratio datasets are used in experiments. The obtained results are compared with nine different state-of-art oversampling methods. In order to show the independence of the experimental results to classifier, Support Vector Machines (SVM), k-Nearest Neighbor (kNN), and Decision Tree (DT) are used as a classifier in the experiments. AUC and G Mean metrics are used for the performance measurements. The experimental results and statistical analyses have shown the triumph of the DEBOHID.
机译:数据集中样本的类分布是影响分类成功的关键因素之一。使用不平衡数据集培训的分类器比少数群体样本更成功地对多数类样本进行分类。过采样基于少数阶级样本的超法是克服阶级不平衡的常用方法。超过二十年,为类别不平衡问题提供了许多过采样方法。差分演进是一种成群质算法,实现了很多域的成功结果。这一成功的主要原因之一是,DE有一个有效的候选人的个人生成机制。在这项工作中,我们提出了一种基于高度不平衡数据集的差分演进算法的新型过采样方法,并且它被命名为DEBOHID(基于差分演进的过采样方法,用于高度不平衡数据集)。为了显示Debohid的成功,实验中使用44个高度不平衡的比例数据集。将得到的结果与九种不同的最先进的过采样方法进行比较。为了显示对分类器的实验结果的独立性,支持向量机(SVM),K最近邻(KNN)和决策树(DT)用作实验中的分类器。 AUC和G平均度量用于性能测量。实验结果和统计分析显示了Debohid的胜利。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号