...
首页> 外文期刊>Pattern Analysis and Applications >A fast classification strategy for SVM on the large-scale high-dimensional datasets
【24h】

A fast classification strategy for SVM on the large-scale high-dimensional datasets

机译:大规模高维数据集上支持向量机的快速分类策略

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The challenges of the classification for the large-scale and high-dimensional datasets are: (1) It requires huge computational burden in the training phase and in the classification phase; (2) it needs large storage requirement to save many training data; and (3) it is difficult to determine decision rules in the high-dimensional data. Nonlinear support vector machine (SVM) is a popular classifier, and it performs well on a high-dimensional dataset. However, it easily leads overfitting problem especially when the data are not evenly distributed. Recently, profile support vector machine (PSVM) is proposed to solve this problem. Because local learning is superior to global learning, multiple linear SVM models are trained to get similar performance to a nonlinear SVM model. However, it is inefficient in the training phase. In this paper, we proposed a fast classification strategy for PSVM to speed up the training time and the classification time. We first choose border samples near the decision boundary from training samples. Then, the reduced training samples are clustered to several local subsets through MagKmeans algorithm. In the paper, we proposed a fast search method to find the optimal solution for MagKmeans algorithm. Each cluster is used to learn multiple linear SVM models. Both artificial datasets and real datasets are used to evaluate the performance of the proposed method. In the experimental result, the proposed method prevents overfitting and underfitting problems. Moreover, the proposed strategy is effective and efficient.
机译:大规模和高维数据集的分类面临的挑战是:(1)在训练阶段和分类阶段需要巨大的计算负担; (2)需要大量的存储需求才能保存大量的训练数据; (3)难以确定高维数据中的决策规则。非线性支持向量机(SVM)是一种流行的分类器,在高维数据集上表现良好。但是,特别是在数据分布不均匀的情况下,很容易导致过度拟合的问题。最近,提出了档案支持向量机(PSVM)来解决这个问题。由于局部学习优于全局学习,因此对多个线性SVM模型进行了训练,以获得与非线性SVM模型相似的性能。但是,它在训练阶段效率低下。在本文中,我们提出了一种针对PSVM的快速分类策略,以加快训练时间和分类时间。我们首先从训练样本中选择决策边界附近的边界样本。然后,通过MagKmeans算法将简化后的训练样本聚类为几个局部子集。在本文中,我们提出了一种快速搜索方法来找到MagKmeans算法的最优解。每个群集用于学习多个线性SVM模型。人工数据集和实际数据集均用于评估所提出方法的性能。在实验结果中,所提出的方法避免了过拟合和欠拟合的问题。而且,所提出的策略是有效和高效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号