首页> 外文会议>Discovery science >Barricaded Boundary Minority Oversampling LS-SVM for a Biased Binary Classification
【24h】

Barricaded Boundary Minority Oversampling LS-SVM for a Biased Binary Classification

机译:偏二元分类的有界边界少数采样LS-SVM

获取原文
获取原文并翻译 | 示例

摘要

Classifying biased datasets with linearly non-separable features has been a challenge in pattern recognition because traditional classifiers, usually biased and skewed towards the majority class, often produce sub-optimal results. However, if biased or unbalanced data is not processed appropriately, any information extracted from such data risks being compromised. Least Squares Support Vector Machines (LS-SVM) is known for its computational advantage over SVM, however, it suffers from the lack of sparsity of the support vectors: it learns the separating hyper-plane based on the whole dataset and often produces biased hyper-planes with imbalanced datasets. Motivated to contribute a novel approach for the supervised classification of imbalanced datasets, we propose Barricaded Boundary Minority Oversampling (BBMO) that oversamples the minority samples at the boundary in the direction of the closest majority samples to remove LS-SVM's bias due to data imbalance. Two variations of BBMO are studied: BBMOl for the linearly separable case which uses the Lagrange multipliers to extract boundary samples from both classes, and the generalized BBM02 for the nonlinear case which uses the kernel matrix to extract the closest majority samples to each minority sample. In either case, BBMO computes the weighted means as new synthetic minority samples and appends them to the dataset. Experiments on different synthetic and real-world datasets show that BBMO with LS-SVM improved on other methods in the literature and motivates follow on research.
机译:用线性不可分的特征对有偏见的数据集进行分类一直是模式识别中的一个挑战,因为传统的分类器通常会偏向多数类并偏向多数类,通常会产生次优的结果。但是,如果没有正确处理有偏见或不平衡的数据,则从此类数据中提取的任何信息都可能受到损害。最小二乘支持向量机(LS-SVM)以其优于SVM的计算优势而著称,但是它缺乏支持向量的稀疏性:它基于整个数据集学习分离的超平面,并且经常产生偏向超数据集不平衡的飞机。为了为不平衡数据集的监督分类提供一种新颖的方法,我们提出了有条件的边界少数群体过采样(BBMO),该边界偏向少数群体样本沿最接近的多数样本的方向进行过采样,以消除由于数据不平衡而导致的LS-SVM偏差。研究了BBMO的两个变体:用于线性可分离情况的BBMO1,它使用拉格朗日乘数来从两个类别中提取边界样本;对于用于非线性情况的广义BBM02,它使用核矩阵来提取与每个少数样本最接近的多数样本。在这两种情况下,BBMO都会将加权均值计算为新的合成少数样本并将其附加到数据集。在不同的合成和真实数据集上进行的实验表明,带有LS-SVM的BBMO在文献中的其他方法上得到了改进,并激发了后续研究的动机。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号