...
首页> 外文期刊>Journal of ambient intelligence and humanized computing >Comprehensive analysis for class imbalance data with concept drift using ensemble based classification
【24h】

Comprehensive analysis for class imbalance data with concept drift using ensemble based classification

机译:基于集体分类的概念漂移的类别不平衡数据综合分析

获取原文
获取原文并翻译 | 示例
           

摘要

In many information system applications, the environment is dynamic and tremendous amount of streaming data is generated. This scenario enforces additional computational demand on the algorithm to process incoming instances incrementally using restricted memory and time compared to static data mining. Moreover, when the streams of data are collected from different sources, it may exhibit concept drift, which means the variation in the distribution of data and it can have a high degree of class imbalance. The problem of class imbalance occurs when there is a much lower number of an example representing one class than those of the other class. Concept drift and imbalanced streaming data are commonly found in real-world applications such as fraud detection, intrusion detection, decision support system and disease prediction. In this paper, the different concept drift detectors and handling approaches are analysed when dealing with imbalance data. A comparative analysis of concept drift is performed on various data sets like SEA synthetic data stream and real world datasets. Massive Online Analysis (MOA) tool is used to make the comparative study about different learners in a concept drifting environment. The performance measure such as Accuracy, Precision, Recall, F1-score and Kappa statistic has been used to evaluate the performance of the various learners on SEA synthetic data stream and real world dataset. Ensemble classifiers and single learners are employed and tested on the data samples of SEA synthetic data stream, electrical and KDD intrusion data set. The ensemble classifiers provide better accuracy when compared to the single classifier and ensemble based methods has shown good performance compared to strong single learners when dealing with concept drift and class imbalance data.
机译:在许多信息系统应用中,环境是动态的,生成大量的流数据。此方案对算法执行额外的计算需求,以使用限制存储器和与静态数据挖掘相比使用受限制的存储器和时间来处理传入实例。此外,当从不同来源收集数据流时,它可能表现出概念漂移,这意味着数据分布的变化,它可以具有高度的类别不平衡。当表示比其他类的那个类的一个示例的示例的数量较低时,发生类别不平衡的问题。概念漂移和不平衡流数据通常在真实世界应用中发现,例如欺诈检测,入侵检测,决策支持系统和疾病预测。在本文中,在处理不平衡数据时,分析了不同的概念漂移探测器和处理方法。对概念漂移的比较分析是对海洋合成数据流和现实世界数据集等各种数据集进行的。大规模的在线分析(MOA)工具用于对概念漂移环境中不同学习者进行比较研究。准确性,精度,召回,F1分数和κ统计等性能措施已被用于评估各种学习者在海洋合成数据流和现实世界数据集中的性能。在海洋合成数据流,电气和KDD入侵数据集的数据样本上采用和测试集合分类器和单一学习者。与单一分类器相比,集合分类器提供更好的准确性,并在处理概念漂移和类不平衡数据时与强大的单一学习者相比,基于组合的方法显示出良好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号