...
首页> 外文期刊>Concurrency and computation: practice and experience >Latent-lSVM classification of very high-dimensional and large-scale multi-class datasets
【24h】

Latent-lSVM classification of very high-dimensional and large-scale multi-class datasets

机译:高维和大规模多类数据集的潜在lSVM分类

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a new parallel learning algorithm of latent local support vector machines(SVM), called latent-lSVM for effectively classifying very high-dimensional and large-scalemulti-class datasets. The common framework of texts/images classification tasks using theBag-Of-(visual)-Words model for the data representation leads to hard classification problemwith thousands of dimensionsandhundreds of classes.Ourlatent-lSVM algorithmperforms thesecomplex tasks into two main steps. The first one is to use latent Dirichlet allocation for assigningthe datapoint (text/image) to some topics (clusters) with the corresponding probabilities. Thisaims at reducing the number of classes and the number of datapoints in the cluster comparedto the full dataset, followed by the second one: to learn in a parallel way nonlinear SVM modelsto classify data clusters locally. The numerical test results on nine real datasets show that thelatent-lSVM algorithm achieves very high accuracy compared to state-of-the-art algorithms. Anexample of its effectiveness is given with an accuracy of 70.14% obtained in the classification ofBook dataset having 100 000 individuals in 89 821 dimensional input space and 661 classes in11.2minutes using a PC Intel(R) Core i7-4790 CPU, 3.6 GHz, 4 cores.
机译:我们提出了一种潜在的局部支持向量机 r n(SVM)的并行学习新算法,称为latent-lSVM,可以有效地对超大规模和大规模的 r n多类数据集进行分类。使用 r n“行包(可视)-Words”模型进行数据表示的文本/图像分类任务的通用框架会导致硬分类问题 r n具有数千个维数和数百个类。我们的latent-lSVM算法执行这些 r n将任务复杂化为两个主要步骤。第一个方法是使用潜在的Dirichlet分配将数据点(文本/图像)分配给具有相应概率的某些主题(群集)。 r 旨在减少与完整数据集相比的群集中的类数和数据点数 r n,其次是第二个:以并行方式学习非线性SVM模型 r n对数据群集进行分类本地。在9个真实数据集上的数值测试结果表明,与最新算法相比, r nlatent-lSVM算法实现了非常高的精度。使用PC Intel在 r nBook数据集的分类中获得了70.14%的准确度示例,该数据集在89821个维输入空间中具有100000个个体,在 r n11.2分钟内具有661个类。 (R)Core i7-4790 CPU,3.6 GHz,4核。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号