首页> 外文会议>Smart Cities Symposium >Scalable Parallel SVM on Cloud Clusters for Large Datasets Classification

【24h】

Scalable Parallel SVM on Cloud Clusters for Large Datasets Classification

机译：用于大型数据集分类的云集群上的可扩展并行SVM

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes a new parallel support vector machine (PSVM) that is efficient in terms of time complexity. Support vector machine is one of the popular classifiers for analysis of data and classification of patterns. However, SVM requires a large memory (in the range of 100 GB or more) in order to process big-data (i.e., in the range of 1 TB data or more). This paper proposes to execute SVMs in parallel on several clusters to analyze and classify big-data. In this approach, the data are divided to n equal partitions. Each partitioned data is used by an individual cluster to train an SVM. The outcomes of each of the SVMs executed on several clusters are then combined by another SVM referred as final SVM. The inputs to this final SVM are the support vectors (SVs) of the SVMs that were executed on different clusters, while the desired output is the corresponding output of the respective SV. We evaluated our proposed method on high performance computing (HPC) clusters and amazon cloud clusters (ACC) using different benchmark datasets. Experimental results show that the proposed method is efficient in terms of training time with minimal error rate and memory requirement, compared to the existing stand-alone SVM.

机译：本文提出了一种新的并行支持向量机（PSVM），其在时间复杂性方面是有效的。支持向量机是用于分析数据和模式分类的流行分类器之一。然而，SVM需要大的存储器（在100 GB或更多范围内），以便处理大数据（即，在1 TB数据的范围内）。本文建议在几个集群上并行执行SVM，以分析和分类大数据。在这种方法中，数据被划分为N等分区。每个分区数据由单个群集使用以培训SVM。然后，在若干簇上执行的每个SVM的结果由另一个SVM组合为最终SVM。该最终SVM的输入是在不同簇上执行的SVM的支持向量（SVS），而所需的输出是相应SV的相应输出。我们使用不同的基准数据集评估了在高性能计算（HPC）集群和亚马逊云集群（ACC）上的提出方法。实验结果表明，与现有的独立SVM相比，该方法在训练时间方面是训练时间的效率，与现有的独立SVM相比。

著录项

来源
《Smart Cities Symposium》|2020年|353p|共5页
会议地点
作者
Md Sarwar M Haque; Ghazanfar Latif; Md Rafiul Hasan; Md Arifuzzaman; Shakib S Shafin; Quazi A Rahman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Support vector machine; Cloud computing; Parallel SVM; Cluster computing; Amazon web services;

机译：支持向量机;云计算;平行SVM;集群计算;亚马逊网络服务;

相似文献

外文文献
中文文献
专利

1. Parallel incremental power mean SVM for the classification of large-scale image datasets [J] . Thanh-Nghi Doan, Thanh-Nghi Do, Fran?ois Poulet International Journal of Multimedia Information Retrieval . 2014,第2期

机译：并行增量功率均值SVM用于大规模图像数据集的分类
2. Latent-lSVM classification of very high-dimensional and large-scale multi-class datasets [J] . Thanh-Nghi Do, François Poulet Concurrency and computation: practice and experience . 2019,第2期

机译：高维和大规模多类数据集的潜在lSVM分类
3. A fast classification strategy for SVM on the large-scale high-dimensional datasets [J] . Li I-Jing, Wu Jiunn-Lin, Yeh Chih-Hung Pattern Analysis and Applications . 2018,第4期

机译：大规模高维数据集上支持向量机的快速分类策略
4. Scalable Parallel SVM on Cloud Clusters for Large Datasets Classification [C] . Md Sarwar M Haque, Ghazanfar Latif, Md Rafiul Hasan, Smart Cities Symposium . 2020

机译：用于大型数据集分类的云集群上的可扩展并行SVM
5. Scalable parallel computing on clouds: Efficient and scalable architectures to perform pleasingly parallel, MapReduce and iterative data intensive computations on cloud environments. [D] . Gunarathne, Thilina. 2014

机译：云上的可伸缩并行计算：高效且可伸缩的架构，可在云环境上执行令人满意的并行，MapReduce和迭代式数据密集型计算。
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets [O] . Mahesh V. Joshi, George Karypis, Vipin Kumar 1998

机译：ScalParC：一种用于挖掘大型数据集的新型可扩展且高效的并行分类算法

Scalable Parallel SVM on Cloud Clusters for Large Datasets Classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅