首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method
【24h】

Large Scale Empirical Risk Minimization via Truncated Adaptive Newton Method

机译:截断自适应牛顿法的大规模经验风险最小化

获取原文
       

摘要

Most second order methods are inapplicable to large scale empirical risk minimization (ERM) problems because both, the number of samples N and number of parameters p are large. Large N makes it costly to evaluate Hessians and large p makes it costly to invert Hessians. This paper propose a novel adaptive sample size second-order method, which reduces the cost of computing the Hessian by solving a sequence of ERM problems corresponding to a subset of samples and lowers the cost of computing the Hessian inverse using a truncated eigenvalue decomposition. Although the sample size is grown at a geometric rate, it is shown that it is sufficient to run a single iteration in each growth stage to track the optimal classifier to within its statistical accuracy. This results in convergence to the optimal classifier associated with the whole set in a number of iterations that scales with $log(N)$. The use of a truncated eigenvalue decomposition result in the cost of each iteration being of order $p^2$. Theoretical performance gains manifest in practical implementations.
机译:大多数二阶方法不适用于大规模经验风险最小化(ERM)问题,因为样本数量N和参数p都很大。较大的N会使评估Hessians的成本很高,而较大的p会使反转Hessians的成本很高。本文提出了一种新颖的自适应样本量二阶方法,该方法通过解决与样本子集相对应的一系列ERM问题来降低计算Hessian的成本,并通过截断特征值分解来降低计算Hessian逆的成本。尽管样本量是以几何速率增长的,但事实表明,在每个增长阶段运行一次迭代就足以跟踪最佳分类器,使其达到统计精度即可。这导致在与$ log(N)$缩放比例的多次迭代中收敛到与整个集合相关联的最优分类器。截短的特征值分解的使用导致每次迭代的成本约为$ p ^ 2 $。理论性能的提高体现在实际的实现中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号