首页> 中文期刊> 《软件》 >基于二次随机森林的不平衡数据分类算法

基于二次随机森林的不平衡数据分类算法

         

摘要

不平衡数据集的分类问题是现今机器学习的一个热点问题。传统分类学习器以提高分类精度为准则导致对少数类识别准确率下降。本文首先综合描述了不平衡数据集分类问题的研究难点和研究进展,论述了对分类算法的评价指标,进而提出一种新的基于二次随机森林的不平衡数据分类算法。首先,用随机森林算法对训练样本学习找到模糊边界,将误判的多数类样本去除,改变原训练样本数据集结构,形成新的训练样本。然后再次使用随机森林对新训练样本数据进行训练。通过对UCI数据集进行实验分析表明新算法在处理不平衡数据集上在少数类的召回率和F值上有提高。%Imbalanced data’s classification (IDC) is one of the hot issues in machine learning. The recall rate of minority class probably reduced as a result of most traditional classified learners only aim for improving system accu-racy. Firstly, analyzes the research difficulties and research progress of IDC in recent year are reviewed. Then, this the-sis discusses some evaluation indexes of classification algorithms. Based on these studie, an new algorithm for IDC on implementing twice random forest algorithm, named as TRF is proposed in this paper. Firstly, applying random forest algorithm is to search the fuzzy boundary, then the majority class samples that are predicted to be minority class will be removed and change the data structure to build new train data sets. This new data sets will be trained to obtain a new classification model by random forest. The experiment results show the TRF algorithm can effectively improve F-measure and the minority class recall rate.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号