Online stochastic gradient descent on non-convex losses from high-dimensional inference

Gerard Ben Arous; Reza Gheissari; Aukosh Jagannath

首页> 外文期刊>Journal of machine learning research >Online stochastic gradient descent on non-convex losses from high-dimensional inference

【24h】

Online stochastic gradient descent on non-convex losses from high-dimensional inference

机译：在线随机梯度下降对高维推理的非凸损耗

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behavior. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions.

机译：随机梯度下降（SGD）是一种流行的优化问题，用于高维推理任务中出现的问题。这里，通过迭代地优化损失函数，从数据的独立样本生成一个未知参数的估计器。这种损失函数是随机的，通常是非凸的。我们研究了SGD最简单版本的SGD，即在线SGD，从参数空间为高维的设置中的随机开始。对于一致估计所需的样本数量，我们培养了几乎尖锐的阈值，因为一个尺寸变化。我们的门槛仅取决于我们称之为指数的人口损失的内在属性。特别地，我们的结果不承担损耗本身的均匀控制，例如凸起或均匀的衍生界。我们获得的阈值是维度中的多项式，并且精确的指数在明确的信息上取决于信息指数。由于我们的结果，我们发现除了最简单的任务之外，几乎所有数据都只是在初始搜索阶段中使用，以获得与地面事实的非琐碎相关性。在获得非琐碎的相关性时，下降是快速的，展示大量行为的规律。我们通过将其应用于诸如相位检索，以及广义线性模型，在线PCA和Spiked Tensor模型的参数估计来说明我们的方法，以及通过一般激活功能监督单层网络的学习。

著录项

来源
《Journal of machine learning research》 |2021年第a期|共51页
作者
Gerard Ben Arous; Reza Gheissari; Aukosh Jagannath;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Statistical Inference for Online Decision Making via Stochastic Gradient Descent [J] . Chen Haoyu, Lu Wenbin, Song Rui Journal of the American statistical association . 2021,第534期

机译：通过随机梯度下降的在线决策统计推断
2. Convergence Rates for the Stochastic Gradient Descent Method for Non-Convex Objective Functions [J] . Benjamin Fehrman, Benjamin Gess, Arnulf Jentzen Journal of machine learning research . 2020,第a期

机译：用于非凸面目标函数的随机梯度下降方法的收敛速率
3. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent [J] . Yuanfeng Wang, Scott Christley, Eric Mjolsness, BMC Systems Biology . 2010,第1期

机译：使用随机梯度下降法的离散观测随机动力学模型的参数推论
4. Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent [C] . Chi Jin, Sham M. Kakade, Praneeth Netrapalli Annual conference on Neural Information Processing Systems . 2016

机译：通过非凸随机梯度下降可证明的高效在线矩阵完成
5. An Investigation of Stochastic Gradient Descent Dynamics of Neural Networks [D] . Luo, Victor. 2021

机译：神经网络随机梯度下降动力学研究
6. Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent [O] . Yuanfeng Wang, Scott Christley, Eric Mjolsness, 2010

机译：使用随机梯度下降法的离散观测随机动力学模型的参数推断
7. A machine learning solver for high-dimensional integrals: Solving Kolmogorov PDEs by stochastic weighted minimization and stochastic gradient descent through a high-order weak approximation scheme of SDEs with Malliavin weights [O] . Riu Naito, Toshihiro Yamada 2020

机译：一种用于高维积分的机器学习求解器：通过具有Malliavin权重的SDES的高阶弱近似方案来解决Kolmogorov PDE和随机梯度下降

Online stochastic gradient descent on non-convex losses from high-dimensional inference

摘要

著录项

相似文献

相关主题

期刊订阅