Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

Sharan Vaswani; Francis Bach; Mark Schmidt

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

【24h】

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

机译：超参数化模型和加速感知器的SGD收敛速度越来越快

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern machine learning focuses on highly expressive models that are able to fit or interpolate the data completely, resulting in zero training loss. For such models, we show that the stochastic gradients of common loss functions satisfy a strong growth condition. Under this condition, we prove that constant step-size stochastic gradient descent (SGD) with Nesterov acceleration matches the convergence rate of the deterministic accelerated method for both convex and strongly-convex functions. We also show that this condition implies that SGD can find a first-order stationary point as efficiently as full gradient descent in non-convex settings. Under interpolation, we further show that all smooth loss functions with a finite-sum structure satisfy a weaker growth condition. Given this weaker condition, we prove that SGD with a constant step-size attains the deterministic convergence rate in both the strongly-convex and convex settings. Under additional assumptions, the above results enable us to prove an $O(1/k^2)$ mistake bound for $k$ iterations of a stochastic perceptron algorithm using the squared-hinge loss. Finally, we validate our theoretical findings with experiments on synthetic and real datasets.

机译：现代机器学习专注于高度表达的模型，这些模型能够完全拟合或内插数据，从而导致零培训损失。对于此类模型，我们表明常见损失函数的随机梯度满足强增长条件。在这种情况下，我们证明了具有Nesterov加速度的恒定步长随机梯度下降（SGD）与凸函数和强凸函数的确定性加速方法的收敛速度匹配。我们还表明，这种情况意味着SGD可以在非凸设置中找到与完全梯度下降一样有效的一阶固定点。在插值下，我们进一步表明，具有有限和结构的所有平滑损失函数都满足较弱的增长条件。鉴于这种较弱的条件，我们证明了具有恒定步长的SGD在强凸和凸凸设置下都可以达到确定的收敛速度。在另外的假设下，上述结果使我们能够证明使用平方铰链损失的随机感知器算法的$ k $迭代有$ O（1 / k ^ 2）$错误界限。最后，我们通过对合成数据集和真实数据集进行实验来验证我们的理论发现。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2009期|共10页
作者
Sharan Vaswani; Francis Bach; Mark Schmidt;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron [J] . Sharan Vaswani, Francis Bach, Mark Schmidt JMLR: Workshop and Conference Proceedings . 2018,第2010期

机译：超参数化模型和加速感知器的SGD收敛速度越来越快
2. THE RATE OF CONVERGENCE OF NESTEROV'S ACCELERATED FORWARD-BACKWARD METHOD IS ACTUALLY FASTER THAN 1/k(2) [J] . Attouch Hedy, Peypouquet Juan SIAM Journal on Optimization: A Publication of the Society for Industrial and Applied Mathematics . 2016,第3期

机译：Nesterov加速的向前-向后方法的收敛速度实际上比1 / k（2）快
3. Faster Modeling Accelerates Powerwing Redesign [J] . Desktop engineering . 2011,第7期

机译：更快的建模加速了Powerwing的重新设计
4. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning [C] . Hao Yu, Sen Yang, Shenghuo Zhu AAAI Conference on Artificial Intelligence . 2019

机译：并行重启SGD，具有更快的融合和更少的通信：搅拌为什么模型平均为深度学习工作
5. Output Function Optimization for Faster Convergence Rate in Underactuated Bipedal Walking Control [D] . Chan, Wai Kei. 2017

机译：欠驱动双足行走控制中更快收敛速度的输出函数优化
6. The FLAME-accelerated signalling tool (FaST) for facile parallelisation of flexible agent-based models of cell signalling [O] . Gavin Fullstone, Cristiano Guttà, Amatus Beyer, 2020

机译：FLAME加速信号工具（FaST）用于基于灵活代理的细胞信号模型的轻松并行化
7. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning [O] . Hao Yu, Sen Yang, Shenghuo Zhu 2019

机译：并行重启SGD，具有更快的融合和更少的通信：揭示为什么模型平均为深度学习工作

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

摘要

著录项

相似文献

相关主题

期刊订阅