...
首页> 外文期刊>Journal of machine learning research >Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms
【24h】

Asymptotic Analysis via Stochastic Differential Equations of Gradient Descent Algorithms in Statistical and Computational Paradigms

机译:统计和计算范式中梯度下降算法随机微分方程的渐近分析

获取原文

摘要

This paper investigates the asymptotic behaviors of gradient descent algorithms (particularly accelerated gradient descent and stochastic gradient descent) in the context of stochastic optimization arising in statistics and machine learning, where objective functions are estimated from available data. We show that these algorithms can be computationally modeled by continuous-time ordinary or stochastic differential equations. We establish gradient flow central limit theorems to describe the limiting dynamic behaviors of these computational algorithms and the large-sample performances of the related statistical procedures, as the number of algorithm iterations and data size both go to infinity, where the gradient flow central limit theorems are governed by some linear ordinary or stochastic differential equations, like time-dependent Ornstein-Uhlenbeck processes. We illustrate that our study can provide a novel unified framework for a joint computational and statistical asymptotic analysis, where the computational asymptotic analysis studies the dynamic behaviors of these algorithms with time (or the number of iterations in the algorithms), the statistical asymptotic analysis investigates the large-sample behaviors of the statistical procedures (like estimators and classifiers) that are computed by applying the algorithms; in fact, the statistical procedures are equal to the limits of the random sequences generated from these iterative algorithms, as the number of iterations goes to infinity. The joint analysis results based on the obtained gradient flow central limit theorems lead to the identification of four factors---learning rate, batch size, gradient covariance, and Hessian---to derive new theories regarding the local minima found by stochastic gradient descent for solving non-convex optimization problems.
机译:本文研究了统计和机器学习中出现的随机优化的背景下的梯度下降算法(特别加速梯度下降和随机梯度下降)的渐近行为,其中从可用数据估计客观函数。我们表明这些算法可以通过连续时间普通或随机微分方程来计算地建模。我们建立渐变流中央极限定理来描述这些计算算法的限制动态行为和相关统计过程的大样本性能,因为算法迭代的数量和数据大小都转到无限远,其中梯度流中央限制定理由某种线性普通或随机微分方程的控制,如时间依赖的ornstein-uhlenbeck过程。我们说明我们的研究可以为联合计算和统计渐变分析提供一种新的统一框架,其中计算渐近分析与统计渐近分析调查的时间(或算法中的迭代次数)研究了这些算法的动态行为。通过应用算法计算的统计程序(如估计器和分类器)的大样本行为;实际上,统计程序等于从这些迭代算法产生的随机序列的限制,因为迭代的数量进入无穷大。基于所获得的梯度流动中央极限定理的联合分析结果导致识别四个因素---学习率,批量大小,梯度协方差,以及赫森斯 - 对于通过随机梯度下降发现的当地最小值的新理论用于解决非凸优化问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号