首页> 外文期刊>International Journal of Performability Engineering >Spark-based Ensemble Learning for Imbalanced Data Classification
【24h】

Spark-based Ensemble Learning for Imbalanced Data Classification

机译:基于Spark的集合学习,用于非衡度数据分类

获取原文
获取原文并翻译 | 示例
           

摘要

With the rapid expansion of Big Data in all science and engineering domains, imbalanced data classification become a more acute problem in various real-world datasets. It is notably difficult to develop an efficient model by using mechanically the current data mining and machine learning algorithms. In this paper, we propose a Spark-based Ensemble Learning for imbalanced data classification approach (SELidc in short). The key point of SELidc lies in preprocessing to balance the imbalanced datasets, and to improve the performance and reduce fitting for the big and imbalanced data by building distributed ensemble learning algorithm. So, SELidc firstly converts the original imbalanced dataset into resilient distributed datasets. Next, in the sampling process, it samples by comprehensive weight, which is obtained in accordance with the weight of each class in majority class and the number of minority class samples. After that, it trains several classifiers with random forest in Spark environment by the correlation feature selection means. Experiments on publicly available UCI datasets and other datasets demonstrate that SELidc achieves more prominent results than other related approaches across various evaluation metrics, it makes full use of the efficient computing power of Spark distributed platform in training the massive data.
机译:随着所有科学和工程域中的大数据的快速扩展,数据分类的不平衡数据分类在各种现实世界数据集中成为更严重的问题。显着难以使用机械数据挖掘和机器学习算法来开发有效的模型。在本文中,我们提出了一种基于Spark的集合学习,以获得非衡度数据分类方法(SELIDC简而言之)。 SELIDC的关键点位于预处理以平衡不平衡数据集,并通过构建分布式集合学习算法来提高性能和减少对大和不平衡数据的拟合。因此,SELIDC首先将原始的不平衡数据集转换为弹性分布式数据集。接下来,在采样过程中,通过综合重量来样品,这是根据多数类别中每个类的重量获得的综合体重和少数类样本的数量。之后,它通过相关特征选择装置列举几种在火花环境中随机森林的分类器。公开可用的UCI数据集和其他数据集的实验表明,SELIDC达到了各种评估指标的其他相关方法的突出结果,它充分利用了Spark分布式平台在训练大规模数据方面的有效计算能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号