首页> 外文学位 >Optimization for Regression, PCA, and SVM: Optimality and Scalability.
【24h】

Optimization for Regression, PCA, and SVM: Optimality and Scalability.

机译:回归,PCA和SVM的优化:最优性和可伸缩性。

获取原文
获取原文并翻译 | 示例

摘要

As profit margins are squeezed, even a slight improvement in the solution quality is critical in many industries and academic fields. Also, due to increasing data size, capability of solving large-scale problems becomes important in many fields. However, many optimization problems in machine learning are currently solved non-optimally due to lack of better algorithms. This thesis contributes to resolve this issue by focusing on designing scalable global optimal algorithms for hard-to-solve or large-scale machine learning problems.;While the ultimate goal is to provide optimal solutions by means of scalable algorithms, the focuses are different in each of the three chapters in this thesis depending on the problem types. In Chapters 1 and 2, the focus is on providing optimal or improved solutions for hard-to-solve problems, whereas in Chapter 3, the focus is on improving computational efficiency to be able to solve large-scale problems. The problems studied in this thesis are important parts of a statistical learning process. Feature selection is crucial in constructing statistical learning models and is done by principal component analysis (PCA) and regression subset selection such as the works in Chapters 1 and 2. Further, due to availability of large scale data, the need and importance of tractable machine learning algorithms, such as the algorithm in Chapter 3, are increasing.;Chapter 1 presents mixed integer programs to find the best subset for multiple linear regression, where the formulations are the first mathematical programming models that directly optimize the mean absolute error and mean squared error. The computational experiment shows that the quality of solutions is significantly improved (up to 30% from stepwise heuristics for the selected data) by the proposed models and algorithms. The MIP models can easily incorporate logical constraints, while stepwise heuristics and most of other algorithms cannot.;Chapter 2 studies optimization of absolute error for PCA and developed convergent algorithms based on iteratively reweighted least square, singular value decomposition, and mathematical programming. One of the convergence results is generalized to show that the eigenvalues of a series of symmetric convergent matrices are convergent. The computational experiment shows that both of the algorithms outperform the benchmark algorithms in the literature in the presence of significant outliers (up to 15% improvement for the selected data).;Chapter 3 develops a general algorithmic framework, called Aggregate and Iterative Disaggregate (AID), for machine learning problems such as least absolute deviation regression and support vector machine (SVM). The algorithm is designed for large-scale problems based on clustering and data aggregation. It is shown that the algorithm is monotonically convergent for some of the selected problems and reduces a significant amount of computational time as data size increases (up to 9 times faster than the state of the art packages and software for the selected data).
机译:由于利润率受到挤压,因此在许多行业和学术领域,即使解决方案质量略有提高也至关重要。而且,由于数据大小的增加,解决大规模问题的能力在许多领域中变得重要。然而,由于缺乏更好的算法,机器学习中的许多优化问题目前都无法得到最佳解决。本文的重点是针对难以解决的或大规模的机器学习问题设计可扩展的全局最优算法,从而为解决这一问题做出了贡献。虽然最终目标是通过可扩展算法提供最优解决方案,但关注点却有所不同。根据问题类型的不同,本文分为三章。在第1章和第2章中,重点在于为难以解决的问题提供最佳或改进的解决方案,而在第3章中,重点在于提高计算效率以解决大规模问题。本文研究的问题是统计学习过程的重要组成部分。特征选择对于构建统计学习模型至关重要,它是通过主成分分析(PCA)和回归子集选择(例如第1章和第2章中的工作)完成的。此外,由于可获得大规模数据,因此需要可处理机器的重要性学习算法(例如第3章中的算法)正在不断增加。第1章介绍了混合整数程序,以找到用于多元线性回归的最佳子集,其中公式化是直接优化平均绝对误差和均方的第一个数学编程模型错误。计算实验表明,所提出的模型和算法显着提高了解决方案的质量(从逐步启发式方法中选取的数据最多提高了30%)。 MIP模型可以轻松地包含逻辑约束,而逐步启发法和大多数其他算法则不能。第二章研究了PCA的绝对误差优化,并基于迭代加权最小二乘,奇异值分解和数学编程开发了收敛算法。概括了其中一个收敛结果,表明一系列对称收敛矩阵的特征值是收敛的。计算实验表明,在存在明显异常值的情况下,这两种算法均优于文献中的基准算法(所选数据最多可提高15%)。;第3章开发了一种通用算法框架,称为聚合和迭代分解(AID) ),用于机器学习问题,例如最小绝对偏差回归和支持向量机(SVM)。该算法设计用于基于聚类和数据聚合的大规模问题。结果表明,该算法对于某些所选问题是单调收敛的,并且随着数据大小的增加而减少了大量的计算时间(比所选数据的最新软件包和软件快9倍)。

著录项

  • 作者

    Park, Young Woong.;

  • 作者单位

    Northwestern University.;

  • 授予单位 Northwestern University.;
  • 学科 Operations research.;Industrial engineering.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 150 p.
  • 总页数 150
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号