首页> 外文会议>International conference on advanced data mining and applications;ADMA 2011 >A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification
【24h】

A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification

机译:基于正态分布的不均衡数据分类过采样方法

获取原文

摘要

This study proposes a normal distribution-based over-sampling approach to balance the number of instances belonging to different classes in a data set. The balanced training data are used to learn unbiased classifiers for the original data set. Under some conditions, the proposed over-sampling approach generates samples with expected mean and variance similar to that of the original minority class data. As the approach tries to generate synthetic data with similar probability distributions to the original data, and expands the class boundaries of the minority class, it may increase the minority class classification performance. Experimental results show that the proposed approach outperforms alternative methods on benchmark data sets most of the times when implementing several classical classification algorithms.
机译:这项研究提出了一种基于正态分布的过采样方法,以平衡数据集中属于不同类别的实例数量。平衡的训练数据用于学习原始数据集的无偏分类器。在某些情况下,建议的过采样方法会生成预期均值和方差与原始少数类数据相似的样本。由于该方法试图生成具有与原始数据相似的概率分布的合成数据,并扩展了少数派类别的类别边界,因此可以提高少数派类别的性能。实验结果表明,该方法在实现几种经典分类算法时,多数时候优于基准数据集的替代方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号