首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification
【24h】

A Distance-Based Weighted Undersampling Scheme for Support Vector Machines and its Application to Imbalanced Classification

机译:支持向量机的基于距离的加权欠采样方案及其在不平衡分类中的应用

获取原文
获取原文并翻译 | 示例

摘要

A support vector machine (SVM) plays a prominent role in classic machine learning, especially classification and regression. Through its structural risk minimization, it has enjoyed a good reputation in effectively reducing overfitting, avoiding dimensional disaster, and not falling into local minima. Nevertheless, existing SVMs do not perform well when facing class imbalance and large-scale samples. Undersampling is a plausible alternative to solve imbalanced problems in some way, but suffers from soaring computational complexity and reduced accuracy because of its enormous iterations and random sampling process. To improve their classification performance in dealing with data imbalance problems, this work proposes a weighted undersampling (WU) scheme for SVM based on space geometry distance, and thus produces an improved algorithm named WU-SVM. In WU-SVM, majority samples are grouped into some subregions (SRs) and assigned different weights according to their Euclidean distance to the hyper plane. The samples in an SR with higher weight have more chance to be sampled and put to use in each learning iteration, so as to retain the data distribution information of original data sets as much as possible. Comprehensive experiments are performed to test WU-SVM via 21 binary-class and six multiclass publically available data sets. The results show that it well outperforms the state-of-the-art methods in terms of three popular metrics for imbalanced classification, i.e., area under the curve, F-Measure, and G-Mean.
机译:支持向量机(SVM)在经典机器学习(尤其是分类和回归)中扮演着重要角色。通过将结构风险最小化,它在有效减少过度拟合,避免尺寸灾难以及不陷入局部极小方面享有良好声誉。但是,当面对类不平衡和大规模样本时,现有的SVM不能很好地执行。欠采样是一种以某种方式解决不平衡问题的可行替代方案,但由于其庞大的迭代次数和随机采样过程,因此其计算复杂性不断提高且准确性降低。为了提高它们在处理数据不平衡问题时的分类性能,这项工作提出了一种基于空间几何距离的SVM加权欠采样(WU)方案,从而产生了一种改进的算法,称为WU-SVM。在WU-SVM中,大多数样本被分组为一些子区域(SR),并根据它们到超平面的欧几里得距离分配不同的权重。具有较高权重的SR中的样本在每次学习迭代中都有更多的机会被采样和使用,从而尽可能地保留原始数据集的数据分布信息。通过21种二进制和6种多类公共可用数据集,进行了全面的实验以测试WU-SVM。结果表明,就不平衡分类的三个流行指标(即曲线下面积,F度量和G均值)而言,它远远优于最新方法。

著录项

  • 来源
  • 作者单位

    Department of Control Science and Engineering, School of Electronics and Information Engineering, Tongji University, Shanghai, China;

    Department of Control Science and Engineering, School of Electronics and Information Engineering, Tongji University, Shanghai, China;

    Institute of Systems Engineering, Macau University of Science and Technology, Macau, China;

    Department of Control Science and Engineering, School of Electronics and Information Engineering, Tongji University, Shanghai, China;

    Department of Control Science and Engineering, School of Electronics and Information Engineering, Tongji University, Shanghai, China;

    Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Support vector machines; Kernel; Euclidean distance; Training; Learning systems; Machine learning algorithms; Optimization;

    机译:支持向量机;核;欧式距离;训练;学习系统;机器学习算法;优化;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号