Fast SVM training algorithm with decomposition on very large data sets

Jian-xiong Dong; Krzyzak A.; Suen C.Y.

首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Fast SVM training algorithm with decomposition on very large data sets

【24h】

Fast SVM training algorithm with decomposition on very large data sets

机译：快速SVM训练算法，可分解非常大的数据集

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Training a support vector machine on a data set of huge size with thousands of classes is a challenging problem. This paper proposes an efficient algorithm to solve this problem. The key idea is to introduce a parallel optimization step to quickly remove most of the nonsupport vectors, where block diagonal matrices are used to approximate the original kernel matrix so that the original problem can be split into hundreds of subproblems which can be solved more efficiently. In addition, some effective strategies such as kernel caching and efficient computation of kernel matrix are integrated to speed up the training process. Our analysis of the proposed algorithm shows that its time complexity grows linearly with the number of classes and size of the data set. In the experiments, many appealing properties of the proposed algorithm have been investigated and the results show that the proposed algorithm has a much better scaling capability than Libsvm, SVM/sup light/, and SVMTorch. Moreover, the good generalization performances on several large databases have also been achieved.

机译：在具有数千个类的庞大数据集上训练支持向量机是一个具有挑战性的问题。本文提出了一种有效的算法来解决这个问题。关键思想是引入并行优化步骤，以快速删除大多数非支持向量，其中使用块对角矩阵近似原始内核矩阵，以便可以将原始问题分解为数百个子问题，可以更有效地解决这些问题。此外，还集成了一些有效的策略（例如内核缓存和内核矩阵的有效计算）以加快训练过程。我们对提出的算法的分析表明，其时间复杂度随类数和数据集大小线性增长。在实验中，研究了该算法的许多吸引人的性质，结果表明，与Libsvm，SVM / sup light /和SVMTorch相比，该算法具有更好的缩放能力。此外，还已经在几个大型数据库上实现了良好的泛化性能。

著录项

来源
《IEEE Transactions on Pattern Analysis and Machine Intelligence 》 |2005年第4期| p.603-618| 共16页
作者
Jian-xiong Dong; Krzyzak A.; Suen C.Y.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术 ;
关键词
computational complexity; learning (artificial intelligence); optimisation; support vector machines; very large databases; SVM training algorithm; block diagonal matrices; handwritten character recognition; parallel optimization; support vector machine; time compl;

机译：计算复杂度;学习（人工智能）;优化;支持向量机;非常大的数据库;SVM训练算法;块对角矩阵;手写字符识别;并行优化;支持向量机;时间补全;

相似文献

外文文献
中文文献
专利

1. A fast SVM training algorithm based on the set segmentation and k-means clustering [J] . YANG Xiaowei, LIN Daying, HAO Zhifeng, 自然科学进展（英文版） . 2003 ,第010期

机译：基于集合分割和k均值聚类的快速SVM训练算法
2. A fast SVM training algorithm based on the set segmentation and k-means clustering [J] . YANG Xiaowei, LIN Daying, HAO Zhifeng, Progress in Natural Science . 2003 ,第10期

机译：基于集合分割和k均值聚类的快速SVM训练算法
3. FAST TRAINING OF LARGE DATA SET ON SOM-SVM FOR PATTERN RECOGNITION [J] . Kohei Miyamoto, Tomoharu Nagao 電子情報通信学会技術研究報告 . 2008 ,第411期

机译：快速识别SOM-SVM上的大数据集以进行模式识别
4. DISVMs: Fast SVMs Training on Large-Scale Data Sets [C] . Lijuan Cui, Changjian Wang, Ziyang Li, IEEE International Conference on Tools with Artificial Intelligence . 2016

机译：DISVM：针对大规模数据集的快速SVM培训
5. Fast hierarchical aggregation and decomposition (HAD) algorithms for optimal routing in high-speed data networks. [D] . Zhu, Shan. 1995

机译：快速分层聚合和分解（HAD）算法，用于高速数据网络中的最佳路由。
6. Efficient algorithms for fast integration on large data sets from multiple sources [O] . Tian Mi, Sanguthevar Rajasekaran, Robert Aseltine 2012

机译：高效算法可快速集成来自多个来源的大型数据集
7. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。
8. Fast Algorithms for Partial Fraction Decomposition. [R] . kung, h. t. tong, d. m. 1976

机译：部分分解的快速算法。

Fast SVM training algorithm with decomposition on very large data sets

摘要

著录项

相似文献

相关主题

期刊订阅