首页> 外文会议>2011 11th International Conference on Intelligent Systems Design and Applications >Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems
【24h】

Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems

机译:用于工业和其他实际问题的不平衡数据集分类的新重采样方法

获取原文

摘要

The paper deals a novel resampling method in order to cope with imbalanced dataset in binary classification problems. Imbalanced datasets are frequently found in many industrial applications: for instance, the occurrence of particular product defects or machine faults are rare events whose detection is of utmost importance. In this paper a new resampling method combining an oversampling and an undersampling techniques is treated. In order to prove the effectiveness of the proposed approach, several tests have been developed. Two classifiers based on Support Vector Machine and Decision Tree have been designed, which are applied for binary classification on four datasets: a synthetic dataset, a widely used public dataset and two industrial datasets. The obtained results are presented and discussed in the paper; in particular, the performance that is achieved by the two classifiers through our resampling approach is compared to the ones that are obtained without any resampling and through the classical SMOTE approach, respectively.
机译:为了解决二元分类问题中的数据集不平衡问题,本文提出了一种新颖的重采样方法。不平衡的数据集经常出现在许多工业应用中:例如,特定产品缺陷或机器故障的发生是罕见事件,其检测至关重要。本文研究了一种结合了过采样和欠采样技术的新的重采样方法。为了证明所提出方法的有效性,已经开发了一些测试。设计了两个基于支持向量机和决策树的分类器,将其应用于四个数据集的二进制分类:合成数据集,广泛使用的公共数据集和两个工业数据集。本文对所获得的结果进行了介绍和讨论;特别是,将这两个分类器通过我们的重采样方法所获得的性能分别与未经重采样和经典SMOTE方法所获得的性能进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号