传统支持向量机算法由于时空复杂度较高,因此很难有效地处理大规模数据.为了降低支持向量机算法的时空复杂度,提出一种基于距离排序的快速支持向量机分类算法.该算法首先计算两类样本点的样本中心,然后对每一个样本计算它与另一类样本中心之间的距离,最后根据距离排序选择一定比例的小距离样本作为边界样本.由于边界样本集合很好地包含了支持向量,而且数目较原始样本集合少得多,因此算法可以在保证支持向量机学习精度的前提下,有效地缩短训练时间和节约存储空间.在UCI标准数据集和20-Newsgroups文本分类数据集上的实验说明算法较以往支持向量预选取算法而言可以更为快速准确地进行支持向量预选取.%As the traditional SVM algorithms have high time and space complexities, it is difficult to deal with the large-scale data. In order to reduce the spatiotemporal complexity of SVM algorithm, in this paper we propose a distance sorting-based fast SVM classification algorithm. The algorithm first calculates the sample centres of the two types of sample points, then for each sample the algorithm calculates the distance between its centre and the centre of another type sample, and at last sorts according to the distances, and selects a certain percentage of samples with small distances as the boundary samples. Since the boundary sample set well contain the support vectors, and their number is much less compared with the original sample set, so this algorithm can effectively shorten the training time and save the storage space in premise of guaranteeing the SVM in good learning accuracy. The experiments on UCI standard data sets and 20-Newsgroups text classification data set demonstrate that our algorithm can pre-select the support vectors faster and more accurately compared with previous support vector selection algorithms.
展开▼