首页> 外文期刊>Journal of machine learning research >Online stochastic gradient descent on non-convex losses from high-dimensional inference
【24h】

Online stochastic gradient descent on non-convex losses from high-dimensional inference

机译:在线随机梯度下降对高维推理的非凸损耗

获取原文
           

摘要

Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random start in the setting where the parameter space is high-dimensional. We develop nearly sharp thresholds for the number of samples needed for consistent estimation as one varies the dimension. Our thresholds depend only on an intrinsic property of the population loss which we call the information exponent. In particular, our results do not assume uniform control on the loss itself, such as convexity or uniform derivative bounds. The thresholds we obtain are polynomial in the dimension and the precise exponent depends explicitly on the information exponent. As a consequence of our results, we find that except for the simplest tasks, almost all of the data is used simply in the initial search phase to obtain non-trivial correlation with the ground truth. Upon attaining non-trivial correlation, the descent is rapid and exhibits law of large numbers type behavior. We illustrate our approach by applying it to a wide set of inference tasks such as phase retrieval, and parameter estimation for generalized linear models, online PCA, and spiked tensor models, as well as to supervised learning for single-layer networks with general activation functions.
机译:随机梯度下降(SGD)是一种流行的优化问题,用于高维推理任务中出现的问题。这里,通过迭代地优化损失函数,从数据的独立样本生成一个未知参数的估计器。这种损失函数是随机的,通常是非凸的。我们研究了SGD最简单版本的SGD,即在线SGD,从参数空间为高维的设置中的随机开始。对于一致估计所需的样本数量,我们培养了几乎尖锐的阈值,因为一个尺寸变化。我们的门槛仅取决于我们称之为指数的人口损失的内在属性。特别地,我们的结果不承担损耗本身的均匀控制,例如凸起或均匀的衍生界。我们获得的阈值是维度中的多项式,并且精确的指数在明确的信息上取决于信息指数。由于我们的结果,我们发现除了最简单的任务之外,几乎所有数据都只是在初始搜索阶段中使用,以获得与地面事实的非琐碎相关性。在获得非琐碎的相关性时,下降是快速的,展示大量行为的规律。我们通过将其应用于诸如相位检索,以及广义线性模型,在线PCA和Spiked Tensor模型的参数估计来说明我们的方法,以及通过一般激活功能监督单层网络的学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号