首页> 外文会议>European Conference on Machine Learning(ECML 2004); 20040920-24; Pisa(IT) >Applying Support Vector Machines to Imbalanced Datasets
【24h】

Applying Support Vector Machines to Imbalanced Datasets

机译:将支持向量机应用于不平衡数据集

获取原文
获取原文并翻译 | 示例

摘要

Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbal-anced datasets in which negative instances heavily outnumber the positive instances (e.g. in gene profiling and detecting credit card fraud). This paper discusses the factors behind this failure and explains why the common strategy of undersampling the training data may not be the best choice for SVM. We then propose an algorithm for overcoming these problems which is based on a variant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et al's different error costs algorithm. We compare the performance of our algorithm against these two algorithms, along with undersampling and regular SVM and show that our algorithm outperforms all of them.
机译:支持向量机(SVM)已被广泛研究,并在许多应用中显示出了惊人的成功。但是,当将SVM应用于从平衡均衡的数据集学习的问题时,其成功非常有限,在这些数据中,负面实例远远超过正面实例(例如在基因分析和检测信用卡欺诈中)。本文讨论了导致失败的原因,并解释了为什么对训练数据进行欠采样的通用策略可能不是SVM的最佳选择。然后,我们提出了一种克服这些问题的算法,该算法基于Chawla等人的SMOTE算法的一种变体,并结合了Veropoulos等人的不同错误成本算法。我们将我们的算法与这两种算法以及欠采样和常规SVM的性能进行了比较,结果表明我们的算法优于所有算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号