首页> 外文会议>2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management >SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling
【24h】

SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling

机译:SCUT:使用SMOTE和基于群集的欠采样进行的多类不平衡数据分类

获取原文
获取原文并翻译 | 示例

摘要

Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the two-class problem has received interest from researchers in recent years, leading to solutions for oil spill detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class imbalance in datasets that contains multiple classes, with varying degree of imbalance, has received limited attention. In such a multi-class imbalanced dataset, the classification model tends to favour the majority classes and incorrectly classify instances from the minority classes as belonging to the majority classes, leading to poor predictive accuracies. Further, there is a need to handle both the imbalances between classes as well as address the selection of examples within a class (i.e. the so-called within class imbalance). In this paper, we propose the SCUT hybrid sampling method, which is used to balance the number of training examples in such a multi-class setting. Our SCUT approach oversamples minority class examples through the generation of synthetic examples and employs cluster analysis in order to undersample majority classes. In addition, it handles both within-class and between-class imbalance. Our experimental results against a number of multi-class problems show that, when the SCUT method is used for pre-processing the data before classification, we obtain highly accurate models that compare favourably to the state-of-the-art.
机译:类不平衡是机器学习中的关键问题,它发生在许多领域。具体而言,近年来,两类问题引起了研究人员的关注,从而导致了漏油检测,肿瘤发现和欺诈性信用卡检测等解决方案。但是,在包含多个类别且不平衡程度不同的数据集中处理类别不平衡的问题受到了有限的关注。在这样的多类不平衡数据集中,分类模型倾向于偏爱多数类,并错误地将少数类的实例分类为属于多数类,从而导致较差的预测准确性。此外,需要处理类别之间的不平衡以及处理类别内的示例的选择(即,所谓的类别内不平衡)。在本文中,我们提出了SCUT混合采样方法,该方法用于在这种多类设置中平衡训练示例的数量。我们的SCUT方法通过生成综合示例对少数派类别的示例进行过度采样,并采用聚类分析以对多数派类别进行欠采样。另外,它可以处理类内部和类之间的不平衡。我们针对许多多类问题的实验结果表明,当使用SCUT方法对分类之前的数据进行预处理时,我们获得的高精度模型可以与最新技术进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号