首页> 外文会议>International Conference on Software, Knowledge Information Management and Applications >An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification
【24h】

An Under-Sampling Method with Support Vectors in Multi-class Imbalanced Data Classification

机译:多类不平衡数据分类中带有支持向量的欠采样方法

获取原文

摘要

Multi-class imbalanced data classification in supervised learning is one of the most challenging research issues in machine learning for data mining applications. Although several data sampling methods have been introduced by computational intelligence researchers in the past decades for handling imbalanced data, still learning from imbalanced data is a challenging task and played as a significant focused research interest as well. Traditional machine learning algorithms usually biased to the majority class instances whereas ignored the minority class instances. As a result, ignoring minority class instances may affect the prediction accuracy of classifiers. Generally, under-sampling and over-sampling methods are commonly used in single model classifiers or ensemble learning for dealing with imbalanced data. In this paper, we have introduced an under-sampling method with support vectors for classifying imbalanced data. The proposed approach selects the most informative majority class instances based on the support vectors that help to engender decision boundary. We have tested the performance of the proposed method with single classifiers (C4.5 Decision Tree classifier and naïve Bayes classifier) and ensemble classifiers (Random Forest and AdaBoost) on 13 benchmark imbalanced datasets. It is explicitly shown by the experimental result that the proposed method produces high accuracy when classifying both the minority and majority class instances compared to other existing methods.
机译:监督学习中的多类不平衡数据分类是用于数据挖掘应用程序的机器学习中最具挑战性的研究问题之一。尽管在过去的几十年中,计算智能研究人员已经引入了多种数据采样方法来处理不平衡数据,但是仍然仍然需要从不平衡数据中学习,这是一项具有挑战性的任务,并且也引起了人们的广泛关注。传统的机器学习算法通常偏向多数类实例,而忽略了少数类实例。结果,忽略少数类实例可能会影响分类器的预测准确性。通常,欠采样和过采样方法通常用于单个模型分类器或整体学习中以处理不平衡数据。在本文中,我们介绍了一种带有支持向量的欠采样方法,用于对不平衡数据进行分类。所提出的方法基于有助于产生决策边界的支持向量来选择信息量最大的多数类实例。我们在13个基准不平衡数据集上使用单一分类器(C4.5决策树分类器和朴素贝叶斯分类器)和集成分类器(Random Forest和AdaBoost)测试了该方法的性能。实验结果清楚地表明,与其他现有方法相比,该方法在对少数派和多数派实例进行分类时可产生较高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号