...
首页> 外文期刊>Neural Networks: The Official Journal of the International Neural Network Society >A systematic study of the class imbalance problem in convolutional neural networks
【24h】

A systematic study of the class imbalance problem in convolutional neural networks

机译:卷积神经网络中班级不平衡问题的系统研究

获取原文
获取原文并翻译 | 示例
           

摘要

In this study, we systematically investigate the impact of class imbalance on classification performance of convolutional neural networks (CNNs) and compare frequently used methods to address the issue. Class imbalance is a common problem that has been comprehensively studied in classical machine learning, yet very limited systematic research is available in the context of deep learning. In our study, we use three benchmark datasets of increasing complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of imbalance on classification and perform an extensive comparison of several methods to address the issue: oversampling, undersampling, two-phase training, and thresholding that compensates for prior class probabilities. Our main evaluation metric is area under the receiver operating characteristic curve (ROC AUC) adjusted to multi-class tasks since overall accuracy metric is associated with notable difficulties in the context of imbalanced data. Based on results from our experiments we conclude that (i) the effect of class imbalance on classification performance is detrimental; (ii) the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; (iii) oversampling should be applied to the level that completely eliminates the imbalance, whereas the optimal undersampling ratio depends on the extent of imbalance; (iv) as opposed to some classical machine learning models, oversampling does not cause overfitting of CNNs; (v) thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest. (c) 2018 Elsevier Ltd. All rights reserved.
机译:在本研究中,我们系统地调查了类别不平衡对卷积神经网络(CNNS)分类性能的影响,并比较常用方法来解决问题。类别不平衡是在古典机器学习中综合研究的常见问题,但在深度学习的背景下可以提供非常有限的系统研究。在我们的研究中,我们使用三个基准数据集增加了复杂性,Mnist,CiFar-10和Imagenet,调查了不平衡对分类的影响,并对几种方法进行了广泛的比较来解决问题:过采样,欠采样,两阶段培训和补偿现有类概率的阈值处理。我们的主要评估度量是在接收器操作特性曲线(ROC AUC)下调整到多级任务的区域,因为总体精度度量在不平衡数据的上下文中与显着困难相关联。根据我们实验的结果,我们得出结论(i)类别失衡对分类性能的影响是有害的; (ii)在几乎所有分析的情景中出现作为占主导地位的阶级不平衡的方法是过采样的; (iii)过采样应适用于完全消除不平衡的水平,而最佳的欠采样率取决于不平衡的程度; (iv)与某些经典机器学习模型相反,过采样不会导致CNNS过度使用; (v)当适当分类案件的总数是感兴趣的情况下,应申请阈值处理以补偿现有的概率。 (c)2018年elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号