首页> 外文OA文献 >Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem
【2h】

Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem

机译:数据采样方法处理大数据多级不平衡问题

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The class imbalance problem has been a hot topic in the machine learning community in recent years. Nowadays, in the time of big data and deep learning, this problem remains in force. Much work has been performed to deal to the class imbalance problem, the random sampling methods (over and under sampling) being the most widely employed approaches. Moreover, sophisticated sampling methods have been developed, including the Synthetic Minority Over-sampling Technique (SMOTE), and also they have been combined with cleaning techniques such as Editing Nearest Neighbor or Tomek’s Links (SMOTE+ENN and SMOTE+TL, respectively). In the big data context, it is noticeable that the class imbalance problem has been addressed by adaptation of traditional techniques, relatively ignoring intelligent approaches. Thus, the capabilities and possibilities of heuristic sampling methods on deep learning neural networks in big data domain are analyzed in this work, and the cleaning strategies are particularly analyzed. This study is developed on big data, multi-class imbalanced datasets obtained from hyper-spectral remote sensing images. The effectiveness of a hybrid approach on these datasets is analyzed, in which the dataset is cleaned by SMOTE followed by the training of an Artificial Neural Network (ANN) with those data, while the neural network output noise is processed with ENN to eliminate output noise; after that, the ANN is trained again with the resultant dataset. Obtained results suggest that best classification outcome is achieved when the cleaning strategies are applied on an ANN output instead of input feature space only. Consequently, the need to consider the classifier’s nature when the classical class imbalance approaches are adapted in deep learning and big data scenarios is clear.
机译:近年来,机器学习社区中的班级不平衡问题一直是一个热门话题。如今,在大数据和深度学习时,这个问题仍然有效。已经进行了大量的工作来对阶级不平衡问题,随机抽样方法(在采样超过和正在采样)是最广泛采用的方法。此外,已经开发了复杂的采样方法,包括合成少数群体过采样技术(SMOTE),并且它们也与清洁技术相结合,例如编辑最近的邻居或Tomek的链接(分别分别拍摄+ enn和Smote + T1)。在大数据上下文中,通过调整传统技术,相对忽略智能方法已经解决了类别不平衡问题是显而易见的。因此,在这项工作中分析了大数据域中深学习神经网络的启发式采样方法的能力和可能性,并特别分析了清洁策略。本研究是在大数据上开发的,从超光谱遥感图像获得的多级不平衡数据集。分析了混合方法对这些数据集的有效性,其中通过Smote清洁数据集,然后通过这些数据训练人工神经网络(ANN),而用ENN处理神经网络输出噪声以消除输出噪声;之后,ANN再次使用所得数据集进行培训。获得的结果表明,当清洁策略应用于ANN输出而不是输入特征空间时,可以实现最佳分类结果。因此,当经典类别不平衡方法适应深度学习和大数据场景时,需要考虑分类器的性质。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号