Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting

机译：使用过采样和渐变升值提高不平衡数据集分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Imbalanced data classification is challenging task for various datasets in the real world. One of technique to enlarge the sample in minority class is oversampling to fix size as majority class. This research aims to test SMOTE, Borderline-SMOTE, and ADASYN to handle dataset imbalance and to observe its impact toward classification accuracy. Gradient Boosting applied as a classifier and seven datasets are used in this research. Accuracy, recall, precision, F1-Score, AUC were also implemented to measure classifier performance. Experiments showed that oversampling technic increase accuracy from 2% to 11% for the dataset Mammography, Liver Disorders, Diabetes (Pima Indian), Indian Liver, Habberman, and Immunotherapy. Borderline-SMOTE increases higher accuracy compared to other oversampling method. Surprisingly, Breast Cancer Wisconsin has steady accuracy with or without oversampling. Even though, oversampling good for data imbalanced, the sensibility of oversampling algorithm and the nature of dataset must considered.

机译：不平衡的数据分类是真实世界中各个数据集的挑战性任务。扩大少数群体类别的方法之一是过采样，以固定为多数类的大小。本研究旨在测试少，边界 - 姆斯和Adasyn来处理数据集不平衡，并观察其对分类准确性的影响。在本研究中使用了作为分类器和七个数据集应用的梯度提升。还实现了准确性，调用，精度，F1分数，以测量分类器性能。实验表明，超采样技术从数据集乳房X线摄影，肝脏紊乱，糖尿病（PIMA印度），印度肝脏，Habberman和免疫疗法增加了2％至11％的准确性。与其他过采样方法相比，边界击球率提高了更高的准确性。令人惊讶的是，乳腺癌威斯康星州的稳定精度有或没有过采样。即使，对于数据不平衡的过采样良好，也必须考虑过采样算法的敏感性和数据集的性质。

著录项

来源
《International Conference on Science in Information Technology》|2019年|1 v.|共6页
会议地点
作者
Nurheri Cahyana; Siti Khomsah; Agus Sasmito Aribowo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Imbalanced dataset; Oversampling; Gradient-Boosting-Classifier;

机译：不平衡数据集;过采样;梯度 - 升压分类器;

相似文献

外文文献
中文文献
专利

1. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [J] . Jinyan Li, Simon Fong, Yunsick Sung, BioData Mining . 2016,第1期

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标合成少数过采样技术算法
2. Fuzzy–synthetic minority oversampling technique: Oversampling based on fuzzy set theory for Android malware detection in imbalanced datasets [J] . Yanping Xu, Chunhua Wu, Kangfeng Zheng, International Journal of Distributed Sensor Networks . 2017,第4期

机译：模糊综合少数群体过采样技术：基于模糊集理论的过采样用于不平衡数据集中的Android恶意软件检测
3. LoRAS: an oversampling approach for imbalanced datasets [J] . Bej Saptarshi, Davtyan Narek, Wolfien Markus, Machine Learning . 2021,第2期

机译：loras：用于非衡度数据集的过采样方法
4. Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting [C] . Nurheri Cahyana, Siti Khomsah, Agus Sasmito Aribowo International Conference on Science in Information Technology . 2019

机译：使用过采样和梯度提升改善不平衡数据集分类
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification [O] . Jinyan Li, Simon Fong, Yunsick Sung, 2016

机译：生物医学数据分类中基于二元不平衡数据集的自适应群聚动态多目标综合少数抽样技术算法
7. An Improved MAHAKIL Oversampling Method for Imbalanced Dataset Classification [O] . Yong Zhang, Tingting Zuo, Lichao Fang, 2021

机译：一种改进的Mahakil过采样方法，用于实施数据集分类

Improving Imbalanced Dataset Classification Using Oversampling and Gradient Boosting

摘要

著录项

相似文献

相关主题

期刊订阅