首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >ProWSyn: Proximity Weighted Synthetic Oversampling Technique for Imbalanced Data Set Learning
【24h】

ProWSyn: Proximity Weighted Synthetic Oversampling Technique for Imbalanced Data Set Learning

机译:ProWSyn:用于不平衡数据集学习的接近加权综合过采样技术

获取原文

摘要

An imbalanced data set creates severe problems for the classifier as number of samples of one class (majority) is much higher than the other class (minority). Synthetic oversampling methods address this problem by generating new synthetic minority class samples. To distribute the synthetic samples effectively, recent approaches create weight values for original minority samples based on their importance and distribute synthetic samples according to weight values. However, most of the existing algorithms create inappropriate weights and in many cases, they cannot generate the required weight values for the minority samples. This results in a poor distribution of generated synthetic samples. In this respect, this paper presents a new synthetic oversampling algorithm, Proximity Weighted Synthetic Oversampling Technique (ProWSyn). Our proposed algorithm generate effective weight values for the minority data samples based on sample's proximity information, i.e., distance from boundary which results in a proper distribution of generated synthetic samples across the minority data set. Simulation results on some real world datasets shows the effectiveness of the proposed method showing improvements in various assessment metrics such as AUC, F-measure, and G-mean.
机译:数据集不平衡给分类器带来了严重的问题,因为一个类别(多数)的样本数量远高于另一类别(少数)的样本数量。合成过采样方法通过生成新的合成少数类采样来解决此问题。为了有效地分配合成样本,最近的方法基于原始少数样本的重要性创建了权重值,并根据权重值分配了合成样本。但是,大多数现有算法创建的权重都不适当,并且在许多情况下,它们无法生成少数样本所需的权重值。这导致生成的合成样品分布不均。在这方面,本文提出了一种新的合成过采样算法,即邻近加权合成过采样技术(ProWSyn)。我们提出的算法根据样本的邻近信息(即距边界的距离)为少数数据样本生成有效权重值,从而可以在少数数据集上正确分配生成的合成样本。在一些现实世界的数据集上的仿真结果表明,该方法的有效性,表明了对各种评估指标(如AUC,F量度和G均值)的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号