首页> 外文学位 >Large-scale machine learning using kernel methods.
【24h】

Large-scale machine learning using kernel methods.

机译:使用内核方法的大规模机器学习。

获取原文
获取原文并翻译 | 示例

摘要

Kernel methods, such as Support Vector Machines (SVMs), are a core machine learning technology. They enjoy strong theoretical foundations and excellent empirical successes in many pattern-recognition applications. However, when kernel methods are applied to many emerging large-scale applications, such as video surveillance, multimedia information retrieval, and web mining, they suffer from the challenges of ineffective and inefficient training. In this dissertation, we explore these challenges and propose strategies to solve them.; We first investigate the imbalanced-training challenge which causes the training of kernel methods to be ineffective. The imbalance-training problem occurs when the training instances of the target class are significantly outnumbered by the other training instances. In such situations, we show the class boundary trained from SVMs can be severely skewed toward the target class. We propose using conformal transformation on the kernel function in Reproducing Kernel Hilbert Space for tackling the challenge.; The training performance of kernel methods greatly depends on the chosen kernel function or matrix. A kernel function or matrix defines a pairwise-similarity measurement between two data instances. We thus develop an algorithm to formulate a context-dependent distance function for measuring such similarity. We demonstrate that the learned distance function leads to improved performance for kernel-based clustering and classification tasks. Moreover, we also research the situations where the similarity measurement to formulate the kernel function might not induce a positive semi-definite (psd) kernel matrix, and hence cannot be used for training with kernel methods. We propose an analytical framework on evaluating several representative spectrum-transformation methods.; Finally, we address the efficiency of kernel methods to achieve fast training on massive data. Especially, we focus on Support Vector Machines. The traditional solutions of SVMs suffer from the widely-known scalability problem. We propose an incremental algorithm, which performs approximate matrix-factorization operations, to speed up SVMs. Two approximate factorization schemes, Kronecker and incomplete Cholesky, are utilized in the primal-dual interior-point method (IPM) to directly solve the quadratic optimization problem in SVMs.; Through theoretical analysis and extensive empirical studies, we show that our proposed approaches are able to perform more effectively, and efficiently, than traditional methods.
机译:诸如支持向量机(SVM)之类的内核方法是一种核心的机器学习技术。他们在许多模式识别应用程序中拥有强大的理论基础和出色的经验成功。但是,当内核方法应用于许多新兴的大型应用程序(例如视频监视,多媒体信息检索和Web挖掘)时,它们会遭受无效和低效培训的挑战。本文探讨了这些挑战,并提出了应对策略。我们首先研究了导致训练方法无效的不平衡训练挑战。当目标类别的训练实例明显多于其他训练实例时,就会发生不平衡训练问题。在这种情况下,我们显示从SVM训练的班级边界可能严重偏向目标班级。我们建议在“再现内核希尔伯特空间”中对内核函数使用共形变换来应对挑战。核方法的训练性能在很大程度上取决于所选的核函数或矩阵。核函数或矩阵定义两个数据实例之间的成对相似性度量。因此,我们开发了一种算法来制定上下文相关距离函数以测量此类相似性。我们证明了学习的距离函数可以提高基于内核的聚类和分类任务的性能。此外,我们还研究了用于度量核函数的相似性度量可能不会引发正半定(psd)核矩阵,因此不能用于核方法训练的情况。我们提出了一种评估几种代表性频谱转换方法的分析框架。最后,我们解决了内核方法​​在海量数据上实现快速训练的效率。特别是,我们专注于支持向量机。 SVM的传统解决方案存在广为人知的可伸缩性问题。我们提出一种增量算法,该算法执行近似矩阵分解操作,以加快SVM的速度。原始对偶内点法(IPM)中使用了两种近似的因式分解方案,即Kronecker和不完全Cholesky,来直接解决SVM中的二次优化问题。通过理论分析和广泛的经验研究,我们证明了我们提出的方法能够比传统方法更有效地执行。

著录项

  • 作者

    Wu, Gang.;

  • 作者单位

    University of California, Santa Barbara.;

  • 授予单位 University of California, Santa Barbara.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号