Hessian-based Bounds on Learning Rate for Gradient Descent Algorithms

机译：梯度下降算法的基于Hessian的学习率边界

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Learning rate is a crucial parameter governing the convergence rate of any learning algorithm. Most of the learning algorithms based on stochastic gradient descent (SGD) method depend on heuristic choice of learning rate. In this paper, we derive bounds on the learning rate of SGD based adaptive learning algorithms by analyzing the largest eigenvalue of the Hessian matrix from first principles. The proposed approach is analytical. To illustrate the efficacy of the analytical approach, we considered several high-dimensional data sets and compared the rate of convergence of error for the neural gas algorithm and showed that the proposed bounds on learning rate result in a faster rate of convergence than AdaDec, Adam, and AdaDelta approaches which require hyper-parameter tuning.

机译：学习率是管理任何学习算法的收敛速率的关键参数。基于随机梯度下降的大多数学习算法（SGD）方法取决于学习率的启发式选择。在本文中，我们通过分析来自第一原理的Hessian矩阵的最大特征值，从基于SGD自适应学习算法的学习率的界限。所提出的方法是分析的。为了说明分析方法的功效，我们考虑了几个高维数据集，并比较了神经气体算法的误差会聚速率，并显示了学习率的拟议界限比Adadec，adam更快地收敛速度以及需要超参数调谐的Adadelta方法。

著录项

来源
《International Joint Conference on Neural Networks》|2020年|1-8|共8页
会议地点
作者
Prayag Gowgi; Shayan Srinivasa Garani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Eigenvalues and eigenfunctions; Convergence; Symmetric matrices; Jacobian matrices; Area measurement; Adaptive learning; Tuning;

机译：特征值和特征函数;收敛;对称矩阵;雅可比矩阵;面积测量;自适应学习;调整;

相似文献

外文文献
中文文献
专利

1. Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates [J] . Journal of complexity . 2020,第Apra期

机译：随机梯度下降优化算法的较低误差范围：缓慢和快速衰减的学习速率的快速收敛速度
2. Learning rates of gradient descent algorithm for classification [J] . Dong XM, Chen DR Journal of Computational and Applied Mathematics . 2009,第1期

机译：梯度下降算法的学习率分类
3. Learning rate of gradient descent multi-dividing ontology algorithm [J] . Jianzhang Wu, Xiao Yu, Wei Gao International Journal of Manufacturing Technology and Management . 2014,第4a6期

机译：梯度下降多重划分本体算法的学习率
4. Fast Bounded Online Gradient Descent Algorithms for Scalable Kernel-Based Online Learning [C] . Peilin Zhao, Jialei Wang, Pengcheng Wu, International Conference on Machine Learning . 2012

机译：基于可扩展内核的在线学习的快速界限在线梯度血缘算法
5. Stochastic Gradient Descent for Modern Machine Learning: Theory, Algorithms and Applications [D] . Kidambi, Rahul. 2019

机译：现代机器学习的随机梯度下降：理论，算法和应用
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates [O] . Arnulf Jentzen, Philippe von Wurstemberger 2020

机译：随机梯度下降优化算法的较低误差界限：缓慢和快速衰减学习率的夏普收敛率
8. Hybrid Conjugate Gradient-Steepest Descent Algorithms for Unconstrained Minimization [R] . Messerli, E. 1968

机译：用于无约束最小化的混合共轭梯度 - 最速下降算法

Hessian-based Bounds on Learning Rate for Gradient Descent Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅