首页> 外文期刊>Applied mathematical sciences >K-Neighbor over-sampling with cleaning data: a new approach to improve classification performance in data sets with class imbalance
【24h】

K-Neighbor over-sampling with cleaning data: a new approach to improve classification performance in data sets with class imbalance

机译:使用清洗数据进行K邻域过度采样:一种新方法,可在具有类不平衡的数据集中提高分类性能

获取原文
           

摘要

The problems of class imbalance have attracted concerns of researchers in the last few years. Class imbalance problems occur when the data has unbalanced proportions between groups or levels of the response variable. The class/group with a dominant proportion is called as the majority class, while the small proportion class is known as the minority. These problems relate to the creation of bias in parameter estimation in a parametric model such as logistic regression and high misclassification rate for the minority class. Eventually, it might create a risks in the policymaking. To overcome these problems, a new approach called K-Neighbor Over-sampling (KNOS) with cleaning data is proposed in this paper. Unlike the other methods such as SMOTE, Border-line SMOTE (BLS), and Safe Level-SMOTE (SLS), in generating synthetic data, KNOS employs K minority class observations, then KNOS proceeds further by removing some majority class observations. Notice that, similar to BLS and SLS, KNOS only generates synthetic samples from original samples which are safe. In this paper KNOS has been applied to logistic regression classification method. The results showed that KNOS method produced a higher performance in terms of AUC,G-Mean, and sensitivity compared to BLS and SLS. Moreover, our study has also shown that KNOS produced more consistent result than the other approaches.
机译:在最近几年中,班级失衡的问题引起了研究人员的关注。当数据在响应变量的组或级别之间的比例不平衡时,会发生类不平衡问题。占主导地位的阶级/群体称为多数派,而小比例的阶级称为少数派。这些问题与参数模型中参数估计的偏差的产生有关,例如逻辑回归和少数族裔的高错误分类率。最终,这可能会在决策过程中带来风险。为了克服这些问题,本文提出了一种新的方法,即带有清洗数据的K邻居过采样(KNOS)。与SMOTE,Border-line SMOTE(BLS)和Safe Level-SMOTE(SLS)等其他方法不同,在生成综合数据时,KNOS使用K个少数类别的观测值,然后KNOS通过删除一些多数类别的观测值进一步进行。请注意,类似于BLS和SLS,KNOS仅从安全的原始样本中生成合成样本。本文将KNOS应用于logistic回归分类方法。结果表明,与BLS和SLS相比,KNOS方法在AUC,G均值和灵敏度方面具有更高的性能。此外,我们的研究还表明,与其他方法相比,KNOS产生了更一致的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号