首页> 外文期刊>Applied Soft Computing >Integration of feature vector selection and support vector machine for classification of imbalanced data
【24h】

Integration of feature vector selection and support vector machine for classification of imbalanced data

机译:集成功能矢量选择和支持向量机,用于分类数据分类

获取原文
获取原文并翻译 | 示例
           

摘要

Support Vector Machine (SVM) has been widely developed for tackling classification problems. Imbalanced data exist in many practical classification problems where the minority class is usually the one of interest. Undersampling is a popular solution for such problems. However, it has the risk of losing useful information in the original data. At the same time, tuning the hyperparameters in SVM is also challenging. By analyzing the geometrical meaning of kernel methods, an approach is proposed in this paper that combines a modified Feature Vector Selection (FVS) method with maximal between-class separability and an easy-tuning version of SVM, i.e. Feature Vector Regression (FVR) proposed in our previous work. In this paper, the modified FVS method selects a small number of data points that can represent linearly all the dataset in the Reproducing Kernel Hilbert Space (RKHS) and the selected data points give also a maximal separability of the imbalanced data in RKHS. The FVR model is also solved analytically, as in least-squared SVM. The decision threshold for classification is optimized to maximize the predefined accuracy metric. Twenty-six imbalanced datasets are considered and comparisons are carried out with several SVM-based methods for imbalanced data. Statistical test shows the effectiveness of the proposed method. (C) 2018 Elsevier B.V. All rights reserved.
机译:支持向量机(SVM)已被广泛开发用于解决分类问题。在许多实际分类问题中存在不平衡数据,其中少数阶级通常是兴趣之一。 under采样是对这些问题的流行解决方案。但是,它具有在原始数据中丢失有用信息的风险。与此同时,调整SVM中的封闭表也是具有挑战性的。通过分析内核方法的几何含义,在本文中提出了一种方法,该方法将修改的特征向量选择(FVS)方法组合在级别的可分离性和易于调整版本的SVM中,即提出的易于调谐版本(FVR)在我们以前的工作中。在本文中,修改的FVS方法选择少量的数据点,其可以在再现内核Hilbert空间(RKHS)中线性地表示线性所有数据集,并且所选数据点也提供了RKHS中的不平衡数据的最大可分性。 FVR模型也在分析上进行解决,如在最小平方的SVM中。分类的判定阈值被优化,以最大化预定义的精度度量。考虑二十六个不平衡数据集,并使用基于几种基于SVM的方法进行比较进行比较。统计测试显示了所提出的方法的有效性。 (c)2018 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号