A Dynamic Spark-based Classification Framework for Imbalanced Big Data

Nahla B. Abdel-Hamid; Sally ElGhamrawy; Ali El Desouky; Hesham Arafat

首页> 外文期刊>Journal of grid computing >A Dynamic Spark-based Classification Framework for Imbalanced Big Data

【24h】

A Dynamic Spark-based Classification Framework for Imbalanced Big Data

机译：基于动态的火花基分类框架，用于基于Big Data

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classification of imbalanced big data has assembled an extensive consideration by many researchers during the last decade. Standard classification methods poorly diagnosis the minority class samples. Several approaches have been introduced for solving the problem of class imbalance in big data to enhance the generalization in classification. However, most of these approaches neglect the effect of border samples on classification performance; the high impact border samples might expose to misclassification. In this paper, a Spark Based Mining Framework (SBMF) is proposed to address the imbalanced data problem. Two main modules are designed for this purpose. The first is the Border Handling Module (BHM) which under samples the low impact majority border instances and oversamples the minority class instances. The second module is the Selective Border Instances sampling (SBI) Module, which enhances the output of the BHM module. The performance of the SBMF framework is evaluated and compared with other recent systems. A number of experiments were performed using moderate and big datasets with different imbalanced ratio. The results obtained from SBMF framework, when compared to the recent works, show better performance for the different datasets and classifiers.

机译：不平衡大数据的分类已经在过去十年中组装了许多研究人员的广泛考虑。标准分类方法诊断少数阶级样本差。介绍了解决大数据中类别不平衡问题的几种方法，以提高分类中的概括。然而，大多数方法都忽视了边界样本对分类性能的影响;高冲击边界样本可能会暴露于错误分类。在本文中，提出了一种基于火花的挖掘框架（SBMF）来解决不平衡数据问题。两个主要模块是为此目的而设计的。首先是边境处理模块（BHM），该模块（BHM）在对低碰撞的多数边界实例上进行了样本，并破坏了少数阶层实例。第二个模块是选择边界实例采样（SBI）模块，其增强了BHM模块的输出。评估SBMF框架的性能并与最近的系统进行比较。使用具有不同不平衡率的中等和大数据集进行许多实验。与最近的作品相比，从SBMF框架获得的结果为不同的数据集和分类器表示更好的性能。

著录项

来源
《Journal of grid computing》 |2018年第4期|共20页
作者
Nahla B. Abdel-Hamid; Sally ElGhamrawy; Ali El Desouky; Hesham Arafat;
展开▼
作者单位

Computers and Systems Department Faculty of Engineering Mansoura University;

Computers Engineering Department MISR Higher Institute for Engineering and Technology;

Computers and Systems Department Faculty of Engineering Mansoura University;

Computers and Systems Department Faculty of Engineering Mansoura University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Big data; Classification; Spark; Sampling methods; Borders; Imbalanced datasets; SMOTE;

机译：大数据;分类;火花;采样方法;边界;不平衡的数据集;SMOTE;

相似文献

外文文献
中文文献
专利

1. A Dynamic Spark-based Classification Framework for Imbalanced Big Data [J] . Nahla B. Abdel-Hamid, Sally ElGhamrawy, Ali El Desouky, Journal of grid computing . 2018,第4期

机译：基于动态的火花基分类框架，用于基于Big Data
2. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [J] . Jinyan Li, Simon Fong, Yunsick Sung, BioData Mining . 2016,第1期

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标合成少数过采样技术算法
3. Framework for imbalanced data classification [J] . Miko?aj B?aszczyk, Joanna J?drzejowicz Procedia Computer Science . 2021,第a期

机译：不平衡数据分类的框架
4. A Data-driven fuzzy modelling framework for the classification of imbalanced data [C] . Adrian Rubio-Solis, George Panoutsos, Steve Thornton 2016 IEEE 8th International Conference on Intelligent Systems . 2016

机译：数据驱动的不平衡数据分类模糊建模框架
5. A Model Fusion Based Framework For Imbalanced Classification Problem with Noisy Dataset [D] . He, Miao. 2014

机译：含噪声数据集的不平衡分类问题的基于模型融合的框架
6. A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection [O] . Yang Zhao, Zoie Shui-Yee Wong, Kwok Leung Tsui 2018

机译：重新平衡罕见事件分类的不平衡医疗数据的框架：以相似声音相似混合事件检测为例
7. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法

A Dynamic Spark-based Classification Framework for Imbalanced Big Data

摘要

著录项

相似文献

相关主题

期刊订阅