首页> 外文会议>International Joint Conference on Neural Networks;IJCNN 2009 >A fast SVM training method for very large datasets
【24h】

A fast SVM training method for very large datasets

机译:针对超大型数据集的快速SVM训练方法

获取原文

摘要

In a standard support vector machine (SVM), the training process has O(n3) time and O(n2) space complexities, where n is the size of training dataset. Thus, it is computationally infeasible for very large datasets. Reducing the size of training dataset is naturally considered to solve this problem. SVM classifiers depend on only support vectors (SVs) that lie close to the separation boundary. Therefore, we need to reserve the samples that are likely to be SVs. In this paper, we propose a method based on the edge detection technique to detect these samples. To preserve the entire distribution properties, we also use a clustering algorithm such as K-means to calculate the centroids of clusters. The samples selected by edge detector and the centroids of clusters are used to reconstruct the training dataset. The reconstructed training dataset with a smaller size makes the training process much faster, but without degrading the classification accuracies.
机译:在标准支持向量机(SVM)中,训练过程具有O(n 3 )时间和O(n 2 )空间复杂度,其中n是训练的大小数据集。因此,对于非常大的数据集,在计算上是不可行的。减小训练数据集的大小自然可以解决该问题。 SVM分类器仅依赖位于分离边界附近的支持向量(SV)。因此,我们需要保留可能是SV的样本。在本文中,我们提出了一种基于边缘检测技术的方法来检测这些样本。为了保留整个分布属性,我们还使用聚类算法(例如K-means)来计算聚类的质心。边缘检测器选择的样本和聚类的质心用于重建训练数据集。重建的训练数据集较小,可以使训练过程更快,但不会降低分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号