首页> 外文期刊>JMLR: Workshop and Conference Proceedings >On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
【24h】

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

机译:具有自适应步长的随机梯度下降的收敛性

获取原文
           

摘要

Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. We show sufficient conditions for these stepsizes to achieve almost sure asymptotic convergence of the gradients to zero, proving the first guarantee for generalized AdaGrad stepsizes in the non-convex setting. Moreover, we show that these stepsizes allow to automatically adapt to the level of noise of the stochastic gradients in both the convex and non-convex settings, interpolating between O(1/T) and O(1/sqrt(T)), up to logarithmic terms.
机译:随机梯度下降是大规模优化机器学习目标函数的一种选择方法。但是,它的性能变化很大,并且在很大程度上取决于步进尺寸的选择。这激发了关于自适应步长的大量研究。但是,目前我们对这些方法的理论理解存在差距,尤其是在非凸设置中。在本文中,我们开始弥合这一差距:我们在凸和非凸设置中从理论上分析了AdaGrad stepsizes的广义版本。我们为这些逐步显示出了充分的条件,以实现梯度几乎为零的几乎肯定的渐近收敛,证明了在非凸设置中广义AdaGrad逐步实现的第一个保证。此外,我们显示出这些阶跃大小允许自动适应凸和非凸设置中的随机梯度的噪声水平,在O(1 / T)和O(1 / sqrt(T))之间进行插值对数项。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号