首页> 外文期刊>Computer and Information Science >An Empirical Analysis of Imbalanced Data Classification
【24h】

An Empirical Analysis of Imbalanced Data Classification

机译:不平衡数据分类的实证分析

获取原文
           

摘要

SVM has been given top consideration for addressing the challenging problem of data imbalance learning. Here,we conduct an empirical classification analysis of new UCI datasets that have dierent imbalance ratios, sizes andcomplexities. The experimentation consists of comparing the classification results of SVM with two other popularclassifiers, Naive Bayes and decision tree C4.5, to explore their pros and cons. To make the comparative exper-iments more comprehensive and have a better idea about the learning performance of each classifier, we employin total four performance metrics: Sensitive, Specificity, G-means and time-based eciency. For each benchmarkdataset, we perform an empirical search of the learning model through numerous training of the three classifiersunder dierent parameter settings and performance measurements. This paper exposes the most significant resultsi.e. the highest performance achieved by each classifier for each dataset. In summary, SVM outperforms the othertwo classifiers in terms of Sensitive (or Specificity) for all the datasets, and is more accurate in terms of G-meanswhen classifying large datasets.
机译:为了解决数据不平衡学习的挑战性问题,已将SVM作为首要考虑。在这里,我们对具有不同失衡率,规模和复杂性的新UCI数据集进行实证分类分析。实验包括将SVM的分类结果与其他两个流行分类器(朴素贝叶斯和决策树C4.5)进行比较,以探讨其优缺点。为了使比较实验更加全面,并对每个分类器的学习性能有了更好的了解,我们总共采用了四个绩效指标:敏感度,特异性,G均值和基于时间的效率。对于每个基准数据集,我们通过在不同的参数设置和性能测量下对三个分类器进行大量训练来对学习模型进行经验搜索。本文揭示了最重要的结果每个分类器为每个数据集实现的最高性能。总而言之,就所有数据集而言,SVM在敏感度(或特异性)方面都优于其他两个分类器,而在对大型数据集进行分类时,就G均值而言,SVM更准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号