On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

Xiaoyu Li; Francesco Orabona

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

【24h】

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

机译：具有自适应步长的随机梯度下降的收敛性

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. We show sufficient conditions for these stepsizes to achieve almost sure asymptotic convergence of the gradients to zero, proving the first guarantee for generalized AdaGrad stepsizes in the non-convex setting. Moreover, we show that these stepsizes allow to automatically adapt to the level of noise of the stochastic gradients in both the convex and non-convex settings, interpolating between O(1/T) and O(1/sqrt(T)), up to logarithmic terms.

机译：随机梯度下降是大规模优化机器学习目标函数的一种选择方法。但是，它的性能变化很大，并且在很大程度上取决于步进尺寸的选择。这激发了关于自适应步长的大量研究。但是，目前我们对这些方法的理论理解存在差距，尤其是在非凸设置中。在本文中，我们开始弥合这一差距：我们在凸和非凸设置中从理论上分析了AdaGrad stepsizes的广义版本。我们为这些逐步显示出了充分的条件，以实现梯度几乎为零的几乎肯定的渐近收敛，证明了在非凸设置中广义AdaGrad逐步实现的第一个保证。此外，我们显示出这些阶跃大小允许自动适应凸和非凸设置中的随机梯度的噪声水平，在O（1 / T）和O（1 / sqrt（T））之间进行插值对数项。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2009期|共10页
作者
Xiaoyu Li; Francesco Orabona;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes [J] . Xiaoyu Li, Francesco Orabona JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：具有自适应步长的随机梯度下降的收敛性
2. Novel Convergence Results of Adaptive Stochastic Gradient Descents [J] . Tao Sun, Linbo Qiao, Qing Liao, IEEE Transactions on Image Processing . 2021,第1期

机译：自适应随机梯度下降的新型收敛结果
3. Near-diffraction-limited flattop laser beam adaptively generated by stochastic parallel gradient descent algorithm [J] . Haotong Ma Zejin Liu Xiaojun Xu Sanhong Wang and Changhai Liu Optics Letters . 2010,第17期

机译：随机平行梯度下降算法自适应产生的近衍射极限平顶激光束
4. Two adaptive stepsize rules for gradient descent and their application to the training of feedforward artificial neural networks [C] . Mohandes, M., Codrington, . 1994

机译：梯度下降的两个自适应分步规则及其在前馈人工神经网络训练中的应用
5. An Investigation of Stochastic Gradient Descent Dynamics of Neural Networks [D] . Luo, Victor. 2021

机译：神经网络随机梯度下降动力学研究
6. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent [O] . Yuanfeng Wang, Scott Christley, Eric Mjolsness, 2010

机译：使用随机梯度下降法的离散观测随机动力学模型的参数推断
7. Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent [O] . Imen Ayadi, Gabriel Turinici 2021

机译：随机跑步-Kutta方法和自适应SGD-G2随机梯度下降

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

摘要

著录项

相似文献

相关主题

期刊订阅