【24h】

How to Escape Saddle Points Efficiently?

机译:如何有效逃避鞍点?

获取原文

摘要

Non-convex optimization is ubiquitous in modem machine learning applications. Gradient descent based algorithms (such as gradient descent or Nesterov's accelerated gradient descent) are widely used in practice since they have been observed to perform well on these problems. From a theoretical standpoint however, this is quite surprising since these nonconvex problems are teeming with highly suboptimal saddle points, and it is well known that gradient descent (and variants) can get stuck in such saddle points. A key question that arises in resolving this apparent paradox is whether gradient descent based algorithms escape saddle points where the Hessian has negative eigenvalues and converge to points where Hessian is positive semidefinite (such points are called second-order stationary points) efficiently. We answer this question in the affirmative by showing the following: (1) Perturbed gradient descent converges to second-order stationary points in a number of iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free")- The convergence rate of this procedure matches the wellknown convergence rate of gradient descent to first-order stationary points, up to log factors, and (2) A variant of Nesterov's accelerated gradient descent converges to second-order stationary points at a faster rate than perturbed gradient descent. The key technical components behind these results are (1) understanding geometry around saddle points and (2) designing a potential function for Nesterov's accelerated gradient descent for non-convex problems.
机译:非凸优化在现代机器学习应用程序中无处不在。基于梯度下降的算法(例如梯度下降或Nesterov的加速梯度下降)在实践中得到了广泛使用,因为已观察到它们在这些问题上的表现很好。但是,从理论的角度来看,这是非常令人惊讶的,因为这些非凸问题在高度次优的鞍点上充斥着,众所周知,梯度下降(和变体)会卡在这些鞍点上。解决此明显矛盾的一个关键问题是,基于梯度下降的算法是否可以有效地避开Hessian具有负特征值的鞍点并收敛到Hessian为正半定点的点(此类点称为二阶固定点)。我们通过显示以下内容来肯定地回答这个问题:(1)扰动梯度下降在多次迭代中收敛到二阶固定点,这些迭代仅在对数上取决于维数(即,几乎是“无维数”) -此过程的收敛速度与众所周知的梯度下降到一阶固定点的收敛速度匹配,直到对数因子,并且(2)Nesterov的加速梯度下降的一种变体以比以下速度更快的速度收敛到第二阶固定点。扰动梯度下降。这些结果背后的关键技术要素是(1)了解鞍点周围的几何形状,以及(2)设计Nesterov的非凸问题的加速梯度下降的潜在函数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号