首页> 外文会议>International conference on advanced data mining and applications >Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification
【24h】

Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification

机译:不平衡数据分类的自适应多目标群交叉优化

获取原文

摘要

Training a classifier with imbalanced dataset where there are more data from the majority class than the minority class is a known problem in data mining research community. The resultant classifier would become under-fitted in recognizing test instances of minority class and over-fitted with overwhelming mediocre samples from the majority class. Many existing techniques have been tried, ranging from artificially boosting the amount of the minority class training samples such as SMOTE, downsizing the volume of the majority class samples, to modifying the classification induction algorithm in favour of the minority class. However, finding the optimal ratio between the samples from the two majority/minority class for building a classifier that has the best accuracy is tricky, due to the non-linear relationships between the attributes and the class labels. Merely rebalancing the sample sizes of the two classes to exact portions will often not produce the best result. Brute-force attempt to search for the perfect combination of majority/minority class samples for the best classification result is NP-hard. In this paper, a unified preprocessing approach is proposed, using stochastic swarm heuristics to cooperatively optimize the mixtures from the two classes by progressively rebuilding the training dataset is proposed. Our novel approach is shown to outperform the existing popular methods.
机译:训练具有不平衡数据集的分类器,在该分类器中,多数类比少数类有更多的数据,这是数据挖掘研究社区中的一个已知问题。最终的分类器在识别少数类的测试实例时会变得不合适,而在多数类中会出现压倒性的中等样本。已经尝试了许多现有技术,范围包括人为地增加少数派训练样本(如SMOTE)的数量,缩小多数派样本的数量,以及修改分类归纳算法以支持少数派。但是,由于属性和类别标签之间存在非线性关系,因此从两个多数/少数族裔类别的样本之间找到最佳比率以构建具有最佳准确性的分类器是很棘手的。仅将两个类别的样本大小重新平衡为精确的部分通常不会产生最佳结果。蛮力搜索大多数/少数族裔样本的完美组合以获得最佳分类结果是NP-hard。本文提出了一种统一的预处理方法,利用随机群启发式方法,通过逐步重建训练数据集,协同优化两种类别的混合物。我们的新颖方法显示出优于现有的流行方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号