首页> 外文OA文献 >A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets
【2h】

A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets

机译:基于过采样和欠采样的不平衡数据集进化演化预处理新方法

摘要

Imbalanced datasets are commonly encountered in real-world classification problems. However, many machine learning algorithms are originally designed for well-balanced datasets. Re-sampling has become an important step to preprocess imbalanced dataset. It aims at balancing the datasets by increasing the sample size of the smaller class or decreasing the sample size of the larger class, which are known as over-sampling and under-sampling respectively. In this paper, a novel sampling strategy based on both over-sampling and under-sampling is proposed, in which the new samples of the smaller class are created by the Synthetic Minority Over-sampling Technique (SMOTE). The improvement of the datasets is done by the evolutionary computational method of CHC that works on both the minority class and majority class samples. The result is a hybrid data preprocessing method that combines both over-sampling and under-sampling techniques to re-sample datasets. The evaluation is done by applying the learning algorithm C4.5 to obtain a classification model from the re-sampled datasets. Experimental results reported that the proposed approach can decrease the over-sampling rate about 50% with only around 3% discrepancy on the accuracy.
机译:实际分类问题中经常会遇到数据集不平衡的情况。但是,许多机器学习算法最初是为平衡的数据集设计的。重新采样已成为预处理不平衡数据集的重要步骤。它旨在通过增加较小类别的样本大小或减小较大类别的样本大小来平衡数据集,这分别称为过采样和欠采样。本文提出了一种基于过采样和欠采样的新型采样策略,其中通过合成少数群体过采样技术(SMOTE)创建了较小类别的新采样。数据集的改进是通过CHC的进化计算方法完成的,该方法适用于少数族裔和多数族裔样本。结果是一种混合数据预处理方法,该方法结合了过采样和欠采样技术来对数据集进行重新采样。通过应用学习算法C4.5从重新采样的数据集中获得分类模型,可以完成评估。实验结果表明,该方法可以将过采样率降低约50%,而准确率仅相差3%左右。

著录项

  • 作者

    Wong GY; Leung FHF; Ling SH;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号