首页> 外文学位 >Efficient Methods for Large-Scale Empirical Risk Minimization
【24h】

Efficient Methods for Large-Scale Empirical Risk Minimization

机译:大规模经验风险最小化的有效方法

获取原文
获取原文并翻译 | 示例

摘要

Empirical risk minimization (ERM) problems express optimal classifiers as solutions of optimization problems in which the objective is the sum of a very large number of sample costs. An evident obstacle in using traditional descent algorithms for solving this class of problems is their prohibitive computational complexity when the number of component functions in the ERM problem is large. The main goal of this thesis is to study different approaches to solve these large-scale ERM problems.;We begin by focusing on incremental and stochastic methods which split the training samples into smaller sets across time to lower the computation burden of traditional descent algorithms. We develop and analyze convergent stochastic variants of quasi-Newton methods which do not require computation of the objective Hessian and approximate the curvature using only gradient information. We show that the curvature approximation in stochastic quasi-Newton methods leads to faster convergence relative to first-order stochastic methods when the problem is ill-conditioned. We culminate with the introduction of an incremental method that exploits memory to achieve a superlinear convergence rate. This is the best known convergence rate for an incremental method.;An alternative strategy for lowering the prohibitive cost of solving large-scale ERM problems is decentralized optimization whereby samples are separated not across time but across multiple nodes of a network. In this regime, the main contribution of this thesis is in incorporating second-order information of the aggregate risk corresponding to samples of all nodes in the network in a way that can be implemented in a distributed fashion. We also explore the separation of samples across both, time and space, to reduce the computational and communication cost for solving large-scale ERM problems. We study this path by introducing a decentralized stochastic method which incorporates the idea of stochastic averaging gradient leading to a low computational complexity method with a fast linear convergence rate.;We then introduce a rethinking of ERM in which we consider not a partition of the training set as in the case of stochastic and distributed optimization, but a nested collection of subsets that we grow geometrically. The key insight is that the optimal argument associated with a training subset of a certain size is not that far from the optimal argument associated with a larger training subset. Based on this insight, we present adaptive sample size schemes which start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. We further show that if we couple the adaptive sample size scheme with Newton's method, it is possible to consider subsequent doubling of the training set and perform a single Newton iteration in between. This is possible because of the interplay between the statistical accuracy and the quadratic convergence region of these problems and yields a method that is guaranteed to solve an ERM problem by performing just two passes over the dataset.
机译:经验最小风险(ERM)问题将最优分类器表示为优化问题的解决方案,其中目标是大量样本成本的总和。使用传统的下降算法来解决此类问题的一个明显障碍是,当ERM问题中的组件函数数量很大时,它们的计算复杂性令人望而却步。本论文的主要目的是研究解决这些大规模ERM问题的不同方法。我们首先关注增量和随机方法,这些方法将训练样本跨时间划分为较小的集合,以降低传统下降算法的计算负担。我们开发和分析准牛顿法的收敛随机变型,该变型不需要计算目标Hessian,而仅使用梯度信息来近似曲率。我们表明,当问题为条件不佳时,相对于一阶随机方法,随机拟牛顿法中的曲率逼近导致更快的收敛速度。我们最终引入了一种增量方法,该方法利用内存来实现超线性收敛速度。这是最有名的增量方法收敛速度。降低解决大规模ERM问题的过高成本的另一种策略是分散式优化,分散式采样不会在时间上而是在网络的多个节点之间进行分离。在这种情况下,本论文的主要贡献在于以一种可以以分布式方式实现的方式,合并了与网络中所有节点的样本相对应的总风险的二阶信息。我们还探索了时间和空间上样本的分离,以减少解决大规模ERM问题所需的计算和通信成本。我们通过引入分散的随机方法来研究这条路径,该方法结合了随机平均梯度的思想,从而导致了具有快速线性收敛速度的低计算复杂度方法;然后我们引入了对ERM的重新思考,其中我们不考虑训练的划分设置为随机和分布式优化的情况,但嵌套集合的子集以几何方式增长。关键的见解是,与某个大小的训练子集相关联的最优论证与与较大的训练子集相关联的最优论证相距不远。基于此见解,我们提出了自适应样本量方案,该方案以少量样本开始,并解决了相应的ERM问题以达到其统计准确性。然后,样本量将以几何方式增长,并将以前的ERM的解决方案用作新ERM的热启动。理论分析表明,对于各种确定性和随机的一阶方法,使用自适应样本大小方法可降低实现整个数据集统计准确性的总体计算成本。我们进一步表明,如果将自适应样本量方案与牛顿方法结合使用,则可以考虑对训练集进行后续加倍,并在两者之间执行单个牛顿迭代。由于这些问题的统计准确性和二次收敛区域之间存在相互作用,因此这是可能的,并且可以保证通过对数据集执行两次遍历就可以保证解决ERM问题的方法。

著录项

  • 作者

    Mokhtari, Aryan.;

  • 作者单位

    University of Pennsylvania.;

  • 授予单位 University of Pennsylvania.;
  • 学科 Electrical engineering.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 318 p.
  • 总页数 318
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号