首页> 中文期刊> 《计算机技术与发展》 >基于KM-SMOTE和随机森林的不平衡数据分类

基于KM-SMOTE和随机森林的不平衡数据分类

         

摘要

The random forest based on SMOTE algorithm can be a good deal classification in imbalance data,is a classifier through trans-forming the data to achieve good classification requirements. But after SMOTE algorithm deals with imbalance data,may cause overall changes of the distribution of imbalance data sets,and fuzzy the boundaries of positive class and negative class. Both defects can easily lead to big difference from the balanced data sets and the original data sets after the change,resulting in classification results not satisfacto-ry. The K-means clustering algorithm can effectively cluster and describe the data distribution. On this basis,combined with K-means al-gorithm and SMOTE algorithm,using the advantages of both,present a KM-SMOTE algorithm based on K-means algorithm,successful-ly resolving these two issues. And for random forest classifier make an experiment. The results also demonstrate that the effect of the im-proved classification algorithm is more obvious.%基于SMOTE算法的随机森林能够很好地处理不平衡数据集的分类,是一种通过对数据进行改造以达到良好分类要求的分类器。但SMOTE算法在处理不平衡数据后,可能会导致不平衡数据集分布的整体变化以及模糊正负类边界。这两个缺陷极易导致平衡后的数据与原始数据集有很大差异,从而使分类结果有提高但仍旧不够理想。K-means算法能够有效地聚类,并达到对数据分布的描述。在此基础上,结合K-means算法与SMOTE算法,利用两者优点,文中提出了一种基于K-means的KM-SMOTE算法,有效地解决了上述两个问题。并用于随机森林分类器进行实验,结果表明,改进后的算法分类效果更加明显。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号