...
首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning
【24h】

MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning

机译:MWMOTE-用于不平衡数据集学习的多数加权少数过采样技术

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Imbalanced learning problems contain an unequal distribution of data samples among different classes and pose a challenge to any classifier as it becomes hard to learn the minority class samples. Synthetic oversampling methods address this problem by generating the synthetic minority class samples to balance the distribution between the samples of the majority and minority classes. This paper identifies that most of the existing oversampling methods may generate the wrong synthetic minority samples in some scenarios and make learning tasks harder. To this end, a new method, called Majority Weighted Minority Oversampling TEchnique (MWMOTE), is presented for efficiently handling imbalanced learning problems. MWMOTE first identifies the hard-to-learn informative minority class samples and assigns them weights according to their euclidean distance from the nearest majority class samples. It then generates the synthetic samples from the weighted informative minority class samples using a clustering approach. This is done in such a way that all the generated samples lie inside some minority class cluster. MWMOTE has been evaluated extensively on four artificial and 20 real-world data sets. The simulation results show that our method is better than or comparable with some other existing methods in terms of various assessment metrics, such as geometric mean (G-mean) and area under the receiver operating curve (ROC), usually known as area under curve (AUC).
机译:不平衡的学习问题包含不同类别之间数据样本的不均匀分布,并且由于难以学习少数类别样本,因此对任何分类器都构成了挑战。合成过采样方法通过生成合成少数类样本以平衡多数类和少数类样本之间的分布来解决此问题。本文指出,大多数现有的过采样方法在某些情况下可能会生成错误的合成少数样本,并使学习任务更加困难。为此,提出了一种称为多数加权少数过采样技术(MWMOTE)的新方法,用于有效处理不平衡的学习问题。 MWMOTE首先识别难以学习的信息丰富的少数族裔样本,然后根据它们与最近的多数族裔样本之间的欧氏距离来分配权重。然后,它使用聚类方法从加权的信息丰富的少数类样本中生成合成样本。这样做的方式是,所有生成的样本都位于某个少数类集群中。 MWMOTE已在四个人工和20个真实数据集上进行了广泛评估。仿真结果表明,我们的方法在各种评估指标(例如几何平均值(G均值)和接收器工作曲线下的面积(ROC),通常称为曲线下面积)方面优于或与其他现有方法相当。 (AUC)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号