首页> 外文会议>International Conference on Knowledge Discovery and Information Retrieval >SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling
【24h】

SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling

机译:SCUT:使用Smote和基于群集的欠采样的多级不平衡数据分类

获取原文
获取外文期刊封面目录资料

摘要

Class imbalance is a crucial problem in machine learning and occurs in many domains. Specifically, the two-class problem has received interest from researchers in recent years, leading to solutions for oil spill detection, tumour discovery and fraudulent credit card detection, amongst others. However, handling class imbalance in datasets that contains multiple classes, with varying degree of imbalance, has received limited attention. In such a multi-class imbalanced dataset, the classification model tends to favour the majority classes and incorrectly classify instances from the minority classes as belonging to the majority classes, leading to poor predictive accuracies. Further, there is a need to handle both the imbalances between classes as well as address the selection of examples within a class (i.e. the so-called within class imbalance). In this paper, we propose the SCUT hybrid sampling method, which is used to balance the number of training examples in such a multi-class setting. Our SCUT approach oversamples minority class examples through the generation of synthetic examples and employs cluster analysis in order to undersample majority classes. In addition, it handles both within-class and between-class imbalance. Our experimental results against a number of multi-class problems show that, when the SCUT method is used for pre-processing the data before classification, we obtain highly accurate models that compare favourably to the state-of-the-art.
机译:类别不平衡是机器学习中的一个重要问题,并发生在许多域中。具体来说,两班问题近年来研究了研究人员的兴趣,导致漏油检测,肿瘤发现和欺诈性信用卡检测的解决方案。但是,处理包含多个类的数据集中的课程不平衡,具有不同程度的不平衡,已受到有限的关注。在这种多级不平衡数据集中,分类模型倾向于支持多数类别,并错误地将少数课程的实例分类为属于多数类,导致预测准确性差。此外,需要处理类之间的不平衡以及在类内的示例中的选择(即,在类别不平衡中所谓的所谓)。在本文中,我们提出了SCUT混合采样方法,用于平衡这种多级设置中的训练示例的数量。我们的SCUT方法通过生成综合实例来防范少数群体类别的例子,并采用集群分析以缺乏多数班级。此外,它处理课堂内和课程之间的不平衡。我们针对许多多级问题的实验结果表明,当SCUT方法用于预处理数据之前,我们获得高度准确的模型,可以对最先进的方式进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号