Benchmarking framework for class imbalance problem using novel sampling approach for big data

Khyati Ahlawat; Anuradha Chug; Amit Prakash Singh

首页> 外文期刊>International journal of systems assurance engineering and management >Benchmarking framework for class imbalance problem using novel sampling approach for big data

【24h】

Benchmarking framework for class imbalance problem using novel sampling approach for big data

机译：使用新颖的大数据采样方法解决类不平衡问题的基准框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The traditional techniques of machine learning always need to be strengthened for dealing with cosmic nature of big data for systematic and methodical learning. The unbalanced distribution of classes in big data, popularly known as imbalanced big data chases the problem of learning to a much higher level. The conventional methods are being progressively modified to handle and curtail the problem of learning from imbalanced datasets in the context of big data at the data level and algorithmic level. In the current study, a cluster heads based data level sampling solution which inherits edge of K-Means and Fuzzy C-Means clustering approaches is applied. The proposed approach is evaluated with three different classifiers namely Support Vector Machines, Decision Tree and k-Nearest Neighbor and compared with conventional SMOTE algorithm. The experiment has shown promising results with an increment of 8.09% and 35.71% in terms of accuracy and AUC respectively, for all imbalanced data-sets. This work imparts a baseline comparison of solutions for imbalanced classification at data level in big data scenario and proposes an efficient clustering-based solution for same.

机译：始终需要加强传统的机器学习技术，以应对大数据的宇宙本质，从而进行系统和有条理的学习。大数据中类的不平衡分布（通常称为不平衡大数据）将学习问题推向更高的层次。常规方法正在逐步修改，以处理和减少在数据级别和算法级别的大数据上下文中从不平衡数据集学习的问题。在当前的研究中，应用了基于簇头的数据级别采样解决方案，该解决方案继承了K-Means的边缘和Fuzzy C-Means聚类方法。用支持向量机，决策树和k最近邻三个不同的分类器对提出的方法进行了评估，并与传统的SMOTE算法进行了比较。对于所有不平衡的数据集，该实验均显示出令人鼓舞的结果，在准确性和AUC方面分别增加了8.09％和35.71％。这项工作为大数据场景中的数据级别的不平衡分类提供了基准解决方案的比较，并提出了一种有效的基于聚类的解决方案。

著录项

来源
《International journal of systems assurance engineering and management》 |2019年第4期|824-835|共12页
作者
Khyati Ahlawat; Anuradha Chug; Amit Prakash Singh;
展开▼
作者单位

University School of Information Communication and Technology Guru Gobind Singh Indraprastha University Sector 16C Dwarka Delhi 110078 India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Class imbalance; SMOTE; Sampling; Big data; Machine learning;

机译：阶级失衡;枪击采样;大数据;机器学习;

相似文献

外文文献
中文文献
专利

1. K-Neighbor over-sampling with cleaning data: a new approach to improve classification performance in data sets with class imbalance [J] . Budi Santoso, Hari Wijayanto, Khairil Anwar Notodiputro, Applied mathematical sciences . 2018,第9a12期

机译：使用清洗数据进行K邻域过度采样：一种新方法，可在具有类不平衡的数据集中提高分类性能
2. RWO-Sampling: A random walk over-sampling approach to imbalanced data classification [J] . Huaxiang Zhang, Mingfang Li Information Fusion . 2014,第Null期

机译：RWO采样：一种用于不平衡数据分类的随机游走过采样方法
3. Robust hybrid data-level sampling approach to handle imbalanced data during classification [J] . Kaur Prabhjot, Gosain Anjana Soft computing: A fusion of foundations, methodologies and applications . 2020,第20期

机译：强大的混合数据级采样方法来处理分类期间的不平衡数据
4. Deep Over-sampling Framework for Classifying Imbalanced Data [C] . Shin Ando, Chun Yuan Huang European conference on machine learning and principles and practice of knowledge discovery in databases . 2017

机译：深度过采样框架，用于对不平衡数据进行分类
5. Alleviating class imbalance using data sampling: Examining the effects on classification algorithms. [D] . Napolitano, Amri E. 2006

机译：使用数据采样缓解类不平衡：检查对分类算法的影响。
6. Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework [O] . Ke Wang, Qingwen Xue, Jian John Lu 2021

机译：危险的驱动程序识别与类不平衡数据和自动化机器学习框架
7. Deep Over-sampling Framework for Classifying Imbalanced Data [O] . Ando, Shin, Huang, Chun-Yuan 2017

机译：用于不平衡数据分类的深度过采样框架

Benchmarking framework for class imbalance problem using novel sampling approach for big data

摘要

著录项

相似文献

相关主题

期刊订阅