首页> 外文会议>International Conference on Big Data and Artificial Intelligence >An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach
【24h】

An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach

机译:一种改进基于混合采样方法的不平衡数据分类方法

获取原文

摘要

The problem of data imbalance has received far- reaching concerns since they could affect the accuracy of classification problem in the area of machine learning. As the minority class instances can be ignored by traditional classifiers, it is necessary to improve the recognition rate of minority instances. Therefore, the paper proposes a new hybrid sampling method to solve the data imbalance problem by enlarging the proportion of minority instances. For the oversampling part, a variant of SMOTE is provided combining methods of LR-SMOTE and CCR (Combined Cleaning and Resampling Algorithm); for the under-sampling part, the Tomek-link method is utilized to complete the task. After the pre-processing stage, the data set is classified by Random Forest (RF). Experimental results show that the novel algorithm effectively enhances the performance of RF on the data set with a higher accuracy.
机译:数据不平衡的问题受到影响远远令人担忧,因为它们可能影响机器学习领域的分类问题的准确性。 随着传统分类器可以忽略少数群体实例,有必要提高少数群体实例的识别率。 因此,本文提出了一种新的混合采样方法,通过扩大少数群体实例的比例来解决数据不平衡问题。 对于过采样部分,提供了LR-Smote和CCR的结合方法的粉碎的变体(组合清洁和重采样算法); 对于下采样部分,利用Tomek-Link方法来完成任务。 在预处理阶段之后,数据集由随机林(RF)分类。 实验结果表明,新型算法有效增强了RF在具有更高精度的数据集上的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号