首页> 外文会议>Smart Cities Symposium >Scalable Parallel SVM on Cloud Clusters for Large Datasets Classification
【24h】

Scalable Parallel SVM on Cloud Clusters for Large Datasets Classification

机译:用于大型数据集分类的云集群上的可扩展并行SVM

获取原文
获取外文期刊封面目录资料

摘要

This paper proposes a new parallel support vector machine (PSVM) that is efficient in terms of time complexity. Support vector machine is one of the popular classifiers for analysis of data and classification of patterns. However, SVM requires a large memory (in the range of 100 GB or more) in order to process big-data (i.e., in the range of 1 TB data or more). This paper proposes to execute SVMs in parallel on several clusters to analyze and classify big-data. In this approach, the data are divided to n equal partitions. Each partitioned data is used by an individual cluster to train an SVM. The outcomes of each of the SVMs executed on several clusters are then combined by another SVM referred as final SVM. The inputs to this final SVM are the support vectors (SVs) of the SVMs that were executed on different clusters, while the desired output is the corresponding output of the respective SV. We evaluated our proposed method on high performance computing (HPC) clusters and amazon cloud clusters (ACC) using different benchmark datasets. Experimental results show that the proposed method is efficient in terms of training time with minimal error rate and memory requirement, compared to the existing stand-alone SVM.
机译:本文提出了一种新的并行支持向量机(PSVM),其在时间复杂性方面是有效的。支持向量机是用于分析数据和模式分类的流行分类器之一。然而,SVM需要大的存储器(在100 GB或更多范围内),以便处理大数据(即,在1 TB数据的范围内)。本文建议在几个集群上并行执行SVM,以分析和分类大数据。在这种方法中,数据被划分为N等分区。每个分区数据由单个群集使用以培训SVM。然后,在若干簇上执行的每个SVM的结果由另一个SVM组合为最终SVM。该最终SVM的输入是在不同簇上执行的SVM的支持向量(SVS),而所需的输出是相应SV的相应输出。我们使用不同的基准数据集评估了在高性能计算(HPC)集群和亚马逊云集群(ACC)上的提出方法。实验结果表明,与现有的独立SVM相比,该方法在训练时间方面是训练时间的效率,与现有的独立SVM相比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号