...
首页> 外文期刊>Soft computing: A fusion of foundations, methodologies and applications >Fuzzy integral-based ELM ensemble for imbalanced big data classification
【24h】

Fuzzy integral-based ELM ensemble for imbalanced big data classification

机译:基于模糊的基于积分的ELM合奏,用于Big Data Classification

获取原文
获取原文并翻译 | 示例
           

摘要

Big data are data too big to be handled and analyzed by traditional software tools, big data can be characterized by five V’s features: volume, velocity, variety, value and veracity. However, in the real world, some big data have another feature, i.e., class imbalanced, such as e-health big data, credit card fraud detection big data and extreme weather forecast big data are all class imbalanced. In order to deal with the problem of classifying binary imbalanced big data, based on MapReduce, non-iterative learning, ensemble learning and oversampling, this paper proposed an promising algorithm which includes three stages. Firstly, for each positive instance, its enemy nearest neighbor is found with MapReduce, and p positive instances are randomly generated with uniform distribution in its enemy nearest neighbor hypersphere, i.e., oversampling p positive instances within the hypersphere. Secondly, l balanced data subsets are constructed and l classifiers are trained on the constructed data subsets with an non-iterative learning approach. Finally, the trained classifiers are integrated by fuzzy integral to classify unseen instances. We experimentally compared the proposed algorithm with three related algorithms: SMOTE, SMOTE+RF-BigData and MR-V-ELM, and conducted a statistical analysis on the experimental results. The experimental results and the statistical analysis demonstrate that the proposed algorithm outperforms the other three methods.
机译:大数据是通过传统的软件工具来处理和分析的数据太大,大数据可以特征在于五V的特点:体积,速度,品种,价值和准确性。然而,在现实世界中,一些大数据有另一个功能,即类不平衡,例如电子健康大数据,信用卡欺诈检测大数据和极端天气预报大数据都是阶级的阶级。为了应对分类二元商品的大数据的问题,基于MapReduce,非迭代学习,集合学习和过采样,提出了一种有前途的算法,包括三个阶段。首先,对于每个阳性实例,它的敌人最近的邻居找到了MapReduce,并且P正面情况是随机生成的,在其敌人最近的邻近的超短3Sphersphere,即超采样在极度内的过采样P积极实例。其次,构建L平衡数据子集,并且L分类器在具有非迭代学习方法的构建数据子集上培训。最后,训练有素的分类器是通过模糊积分集成的,以分类看不见的实例。我们通过实验比较了三种相关算法的提出算法:Smote,Smote + RF-BigData和MR-V-Elm,并对实验结果进行了统计分析。实验结果和统计分析表明,所提出的算法优于其他三种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号