首页> 外文会议>International Conference on Machine Learning and Cybernetic >A Three-stage Method for Classification of Binary Imbalanced Big Data
【24h】

A Three-stage Method for Classification of Binary Imbalanced Big Data

机译:二元分类的三阶段方法不平衡大数据

获取原文

摘要

In the real world, there are many imbalanced data classification problems, such as extreme weather prediction, software defect prediction, machinery fault diagnosis, spam filtering, etc. It has important theoretical and practical value to study the problem of imbalanced data classification. In the framework of binary imbalanced data classification, a three-stage method for classification of binary imbalanced big data was proposed in this paper. Specifically, in the first stage, the negative class big data was clustered into K clusters by K-means algorithm on Hadoop platform. In the second stage, we use instance selection method to select important samples from each cluster in parallel, and obtain K negative class subsets. In the third stage, we first construct K balanced training sets which consist of negative class subset and positive class subset, and then train K classifiers, and finally we integrate these classifiers to classify the unseen samples. Some experiments are conducted to compare the proposed method with two state-of-the-art methods on G-means. The experimental results demonstrate that the proposed method is more effective and efficient than the compared approaches.
机译:在现实世界中,存在许多不平衡的数据分类问题,例如极端天气预报,软件缺陷预测,机械故障诊断,垃圾邮件过滤等。它具有研究不平衡数据分类问题的重要理论和实用价值。在二进制不平衡数据分类的框架中,本文提出了一种三级分类的分类方法。具体地,在第一阶段,在Hadoop平台上通过K-Means算法将负类大数据群集成K集群。在第二阶段,我们使用实例选择方法并行从每个群集中选择重要的样本,并获得k负类子集。在第三阶段,我们首先构建由负类子集和正类子集组成的K平衡训练集,然后培训K分类器,最后我们集成了这些分类器来分类了看不见的样本。进行了一些实验以比较拟议的方法在G-inse上用两种最先进的方法进行比较。实验结果表明,所提出的方法比比较的方法更有效和有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号