Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting

机译：使用过采样和梯度提升改善不平衡数据集分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced data classification is challenging task for various datasets in the real world. One of technique to enlarge the sample in minority class is oversampling to fix size as majority class. This research aims to test SMOTE, Borderline-SMOTE, and ADASYN to handle dataset imbalance and to observe its impact toward classification accuracy. Gradient Boosting applied as a classifier and seven datasets are used in this research. Accuracy, recall, precision, F1-Score, AUC were also implemented to measure classifier performance. Experiments showed that oversampling technic increase accuracy from 2% to 11% for the dataset Mammography, Liver Disorders, Diabetes (Pima Indian), Indian Liver, Habberman, and Immunotherapy. Borderline-SMOTE increases higher accuracy compared to other oversampling method. Surprisingly, Breast Cancer Wisconsin has steady accuracy with or without oversampling. Even though, oversampling good for data imbalanced, the sensibility of oversampling algorithm and the nature of dataset must considered.

机译：对于现实世界中的各种数据集，不平衡的数据分类是一项艰巨的任务。扩大少数族裔样本的一种技术是过采样以将大小固定为多数族。这项研究旨在测试SMOTE，Borderline-SMOTE和ADASYN，以处理数据集不平衡并观察其对分类准确性的影响。这项研究将梯度提升用作分类器，并使用了七个数据集。准确性，召回率，精度，F1-分数，AUC也已实施，以衡量分类器的性能。实验表明，对于乳腺X线摄影，肝病，糖尿病（Pima Indian），印度肝，Habberman和免疫治疗数据集，过采样技术的准确性从2％提高到11％。与其他过采样方法相比，Borderline-SMOTE提高了准确性。出人意料的是，威斯康星州的乳腺癌无论是否进行过采样都具有稳定的准确性。即使过采样有益于数据不平衡，也必须考虑过采样算法的敏感性和数据集的性质。

著录项

来源
《International Conference on Science in Information Technology》|2019年|217-222|共6页
会议地点
作者
Nurheri Cahyana; Siti Khomsah; Agus Sasmito Aribowo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Imbalanced dataset; Oversampling; Gradient-Boosting-Classifier;

机译：不平衡数据集;过采样;梯度提升分类器;

相似文献

外文文献
中文文献
专利

1. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [J] . Jinyan Li, Simon Fong, Yunsick Sung, BioData Mining . 2016,第1期

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标合成少数过采样技术算法
2. Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets [J] . Yanping Xu, Chunhua Wu, Kangfeng Zheng, International Journal of Distributed Sensor Networks . 2017,第4期

机译：模糊综合少数群体过采样技术：基于模糊集理论的过采样用于不平衡数据集中的Android恶意软件检测
3. LoRAS: an oversampling approach for imbalanced datasets [J] . Bej Saptarshi, Davtyan Narek, Wolfien Markus, Machine Learning . 2021,第2期

机译：loras：用于非衡度数据集的过采样方法
4. Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting [C] . Nurheri Cahyana, Siti Khomsah, Agus Sasmito Aribowo International Conference on Science in Information Technology . 2019

机译：使用过采样和渐变升值提高不平衡数据集分类
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. An Improved MAHAKIL Oversampling Method for Imbalanced Dataset Classification [O] . Yong Zhang, Tingting Zuo, Lichao Fang, 2021

机译：一种改进的Mahakil过采样方法，用于实施数据集分类

Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting

摘要

著录项

相似文献

相关主题

期刊订阅