首页> 中文期刊>计算机与现代化 >基于累积正样本的偏斜数据流集成分类方法

基于累积正样本的偏斜数据流集成分类方法

     

摘要

针对现有处理偏斜数据流的方法存在过拟合或者未充分利用现有数据这一问题,提出一种基于累积正样本的偏斜数据流集成分类方法EAMIDS。该算法把目前达到的所有数据块的正样本收集起来生成集合AP,然后采用KNN算法和Over-sampling方法来平衡数据块的类分布。当基分类器数量超过最大值时,根据F-Measure值来更新集成分类器。通过在模拟数据集SEA和SPH上的实验,与IDSL算法和SMOTE算法相比,表明EAMIDS具有更高的准确率。%To solve the issue of over-fitting and not making full use of current data in existing methods of balancing imbalanced data stream, a method named EAMIDS for imbalanced data stream is proposed based on accumulated positive samples. In EAM-IDS, positive samples from previous training chunks are accumulated to form the AP set which is used to balance the class distri-butions by making use of K nearest neighbors and Over-sampling technique. The ensemble classifier will be updated according to F-Measure when the number of the available base classifiers is greater than the fixed size of the ensemble classifier. Empirical study on both SEA dataset and SPH dataset shows that the proposed EAMIDS has substantial advantage over IDSL approach and SMOTE approach in prediction accuracy.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号