The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data

Justin M. Johnson; Taghi M. Khoshgoftaar

首页> 外文期刊>Information systems frontiers >The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data

【24h】

The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data

机译：数据采样与深度学习和高度不平衡大数据的影响

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Training predictive models with class-imbalanced data has proven to be a difficult task. This problem is well studied, but the era of big data is producing more extreme levels of imbalance that are increasingly difficult to model. We use three data sets of varying complexity to evaluate data sampling strategies for treating high class imbalance with deep neural networks and big data. Sampling rates are varied to create training distributions with positive class sizes from 0.025%-90%. The area under the receiver operating characteristics curve is used to compare performance, and thresholding is used to maximize class performance. Random over-sampling (ROS) consistently outperforms under-sampling (RUS) and baseline methods. The majority class proves susceptible to misrepresentation when using RUS, and results suggest that each data set is uniquely sensitive to imbalance and sample size. The hybrid ROS-RUS maximizes performance and efficiency, and is our preferred method for treating high imbalance within big data problems.

机译：具有类别不平衡数据的培训预测模型已被证明是一项艰巨的任务。这个问题很好，但大数据的时代正在产生更极端的不平衡，越来越难以模型。我们使用三种不同复杂性的数据集来评估数据采样策略，以便与深神经网络和大数据处理高级不平衡。采样率变化，以创建培训分布，尺寸为0.025％-90％。接收器操作特性曲线下的区域用于比较性能，并且阈值用于最大化类性能。随机过度采样（ROS）始终如一地优于取样（RUS）和基线方法。在使用RUS时，大多数阶级都证明了歪曲的影响，结果表明每个数据集对不平衡和样本大小都是唯一敏感的。杂交ROS-RUS最大限度地提高了性能和效率，是我们在大数据问题内治疗高不平衡的首选方法。

著录项

来源
《Information systems frontiers》 |2020年第5期|1113-1131|共19页
作者
Justin M. Johnson; Taghi M. Khoshgoftaar;
展开▼
作者单位

Florida Atlantic University 777 Glades Road Boca Raton FL 33431 USA;

Florida Atlantic University 777 Glades Road Boca Raton FL 33431 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Class imbalance; Big data; Data sampling; Artificial neural networks; Deep learning;

机译：班级不平衡;大数据;数据抽样;人工神经网络;深度学习;
入库时间 2022-08-18 21:05:49

相似文献

外文文献
中文文献
专利

1. HSDLM: A Hybrid Sampling With Deep Learning Method for Imbalanced Data Classification [J] . Hasib Khan Md, Towhid Nurul Akter, Islam Md Rafiqul International journal of cloud applications and computing . 2021,第4期

机译：HSDLM：具有深入学习方法的混合采样，用于实施数据分类
2. Hybrid geometric sampling and AdaBoost based deep learning approach for data imbalance in E-commerce [J] . Sunita Dhote, Chandan Vichoray, Rupesh Pais, Electronic Commerce Research . 2020,第2期

机译：电子商务中数据不平衡的混合几何采样与基于Adaboost的深度学习方法
3. Assessments of Feature Selection Techniques with Respect to Data Sampling for Highly Imbalanced Software Measurement Data [J] . Kehan Gao, Taghi M. Khoshgoftaar International Journal of Reliability, Quality and Safety Engineering . 2015,第2期

机译：关于高度不平衡的软件测量数据的数据采样的特征选择技术评估
4. A Data Augmentation-Assisted Deep Learning Model for High Dimensional and Highly Imbalanced Hyperspectral Imaging Data [C] . Juan F. Ramirez Rochac, Nian Zhang, Lara Thompson, International Conference on Information Science and Technology . 2019

机译：高维和高度不平衡的高光谱成像数据的数据增强辅助深度学习模型
5. Deep Learning Based Imbalanced Data Classification and Information Retrieval for Multimedia Big Data [D] . Yan, Yilin. 2018

机译：基于深度学习的多媒体大数据不平衡数据分类与信息检索
6. Facial Expression Recognition Based on Weighted-Cluster Loss and Deep Transfer Learning Using a Highly Imbalanced Dataset [O] . Quan T. Ngo, Seokhoon Yoon 2020

机译：基于高度不平衡数据集的加权聚类损失和深度转移学习的面部表情识别
7. Facial Expression Recognition Based on Weighted-Cluster Loss and Deep Transfer Learning Using a Highly Imbalanced Dataset [O] . Quan T. Ngo, Seokhoon Yoon 2020

机译：基于加权集群丢失和使用高度不平衡数据集的深度传输学习的面部表情识别

The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data

摘要

著录项

相似文献

相关主题

期刊订阅