A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training

Jan Vaněk; Josef Michálek; Josef Psutka

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training

【24h】

A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training

机译：用于支持向量机训练的GPU体系结构优化的分层分解算法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In the last decade, several GPU implementations of Support Vector Machine (SVM) training with nonlinear kernels were published. Some of them even with source codes. The most effective ones are based on Sequential Minimal Optimization (SMO). They decompose the restricted quadratic problem into a series of smallest possible subproblems, which are then solved analytically. For large datasets, the majority of elapsed time is spent by a large amount of matrix-vector multiplications that cannot be computed efficiently on current GPUs because of limited memory bandwidth. In this paper, we introduce a novel GPU approach to the SVM training that we call Optimized Hierarchical Decomposition SVM (OHD-SVM). It uses a hierarchical decomposition iterative algorithm that fits better to actual GPU architecture. The low decomposition level uses a single GPU multiprocessor to efficiently solve a local subproblem. Nowadays a single GPU multiprocessor can run thousand or more threads that are able to synchronize quickly. It is an ideal platform for a single kernel SMO-based local solver with fast local iterations. The high decomposition level updates gradients of entire training set and selects a new local working set. The gradient update requires many kernel values that are costly to compute. However, solving a large local subproblem offers an efficient kernel values computation via a matrix-matrix multiplication that is much more efficient than the matrix-vector multiplication used in already published implementations. Along with a description of our implementation, the paper includes an exact comparison of five publicly available C++ SVM training GPU implementations. In this paper, the binary classification task and RBF kernel function are taken into account as it is usual in most of the recent papers. According to the measured results on a wide set of publicly available datasets, our proposed approach excelled significantly over the other methods in all datasets. The biggest difference was on the largest dataset where we achieved speed-up up to 12 times in comparison with the fastest already published GPU implementation. Moreover, our OHD-SVM is the only one that can handle dense as well as sparse datasets. Along with this paper, we published the source-codes at https://github.com/OrcusCZ/OHD-SVM.

机译：在过去的十年中，发布了使用非线性内核的支持向量机（SVM）训练的几种GPU实现。其中一些甚至带有源代码。最有效的方法基于顺序最小优化（SMO）。他们将受限二次问题分解为一系列可能的最小子问题，然后通过解析来解决。对于大型数据集，由于耗费的内存带宽有限，大量的矩阵向量乘法无法使用大量的矩阵向量乘法运算，而这些运算在当前的GPU上无法有效计算。在本文中，我们为SVM培训介绍了一种新颖的GPU方法，称为优化分层分解SVM（OHD-SVM）。它使用分层分解迭代算法，更适合实际的GPU架构。低分解级别使用单个GPU多处理器来有效解决局部子问题。如今，单个GPU多处理器可以运行成千上万个线程，这些线程可以快速同步。对于具有快速局部迭代的基于单核SMO的局部求解器而言，它是理想的平台。高分解级别更新整个训练集的梯度并选择一个新的本地工作集。梯度更新需要许多计算成本很高的内核值。但是，解决大型局部子问题可通过矩阵矩阵乘法提供有效的内核值计算，该效率比已发布的实现中使用的矩阵向量乘法效率更高。除了对我们的实现的描述之外，本文还包括对五个公开可用的C ++ SVM训练GPU实现的精确比较。在本文中，二进制分类任务和RBF核函数被考虑在内，这是大多数近期论文中常见的。根据对大量公开可用数据集的测量结果，我们提出的方法在所有数据集中均优于其他方法。最大的差异在于最大的数据集，与已经发布的最快的GPU实施相比，我们的速度提高了12倍。而且，我们的OHD-SVM是唯一可以处理密集数据集和稀疏数据集的SVM。与本文一起，我们在https://github.com/OrcusCZ/OHD-SVM上发布了源代码。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems》 |2017年第12期|3330-3343|共14页
作者
Jan Vaněk; Josef Michálek; Josef Psutka;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Graphics processing units; Support vector machines; Computer architecture; Open source software; Optimization;

机译：图形处理单元;支持向量机;计算机体系结构;开源软件;优化;

相似文献

外文文献
中文文献
专利

1. TWO NEW DECOMPOSITION ALGORITHMS FOR TRAINING BOUND-CONSTRAINED SUPPORT VECTOR MACHINES [J] . Lingfeng Niu, Ruizhi Zhou, Xi Zhao, Foundations of computing and decision sciences . 2015,第1期

机译：训练有界约束的支持向量机的两种新的分解算法
2. Feasible Direction Decomposition algorithms for Training Support Vector Machines [J] . Pavel Laskov Machine Learning . 2002,第1a2a3期

机译：训练支持向量机的可行方向分解算法
3. Machine training and parameter settings with social emotional optimization algorithm for support vector machine [J] . Yunqiang Zhang, Peilin Zhang Pattern recognition letters . 2015,第mara1期

机译：支持向量机的社交情感优化算法的机器训练和参数设置
4. Decomposition Algorithms for Training Large-Scale Semiparametric Support Vector Machines [C] . Sangkyun Lee, Stephen J. Wright European conference on machine learning and knowledge discovery in databases;ECML PKDD 2009 . 2009

机译：训练大型半参数支持向量机的分解算法
5. Decomposition techniques for support vector machines training and applications. [D] . Ince, Huseyin. 2002

机译：支持向量机训练和应用的分解技术。
6. Doubly Optimized Calibrated Support Vector Machine (DOC-SVM): An Algorithm for Joint Optimization of Discrimination and Calibration [O] . Xiaoqian Jiang, Aditya Menon, Shuang Wang, -1

机译：双优化校准支持向量机（DOC-sVm）算法用于校准和辨别的联合优化
7. Multi-Step Wind Speed Forecasting Using Signal Decomposing Algorithms, Bat Optimization Algorithm and Least Squares Support Vector Machine [O] . Zhou Li, Chunxiang Li 2018

机译：使用信号分解算法的多步风速预测，BAT优化算法和最小二乘支持向量机
8. Polynomial-Time Decomposition Algorithms for Support Vector Machines [R] . Hush, D., Scovel, C. 2001

机译：支持向量机的多项式时间分解算法

A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅