首页> 外文会议>DAGM German Conference on Pattern Recognition >Does SGD Implicitly Optimize for Smoothness?

【24h】

Does SGD Implicitly Optimize for Smoothness?

机译：SGD是否隐含地优化了平滑性？

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern neural networks can easily fit their training set perfectly. Surprisingly, despite being "overfit" in this way, they tend to generalize well to future data, thereby defying the classic bias-variance trade-off of machine learning theory. Of the many possible explanations, a prevalent one is that training by stochastic gradient descent (SGD) imposes an implicit bias that leads it to learn simple functions, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the smoothness conjecture which states that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and we conduct experiments to determine whether SGD indeed implicitly optimizes for these measures. Our findings rule out the possibility that smoothness measures based on first-order derivatives are being implicitly enforced. They are supportive, though, of the smoothness conjecture for measures based on second-order derivatives.

机译：现代神经网络可以很容易地符合完美的培训。令人惊讶的是，尽管以这种方式存在“过度装备”，但它们倾向于概括到未来的数据，从而忽略了机器学习理论的经典偏差差异。在许多可能的解释中，普遍的是通过随机梯度下降（SGD）的训练施加了隐含的偏压，导致它学习简单的功能，并且这些简单的功能概括了很好。然而，这种隐含偏差的具体细节并不是很好地理解。在这项工作中，我们探讨了平滑度猜想，说明SGD隐含地偏向于流畅的学习功能。我们提出了几项措施来形式化平稳概念的平稳概念，我们进行实验以确定SGD是否确实隐含地针对这些措施优化了优化。我们的调查结果排除了基于一阶衍生品的平稳度措施的可能性是隐含强制执行的。但是，它们是支持性的，用于基于二阶衍生物的措施的平滑度猜想。

著录项

来源
《DAGM German Conference on Pattern Recognition 》|2020年|246-259|共14页
会议地点
作者
Vaclav Volhejn; Christoph Lampert;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Implicit smoothing and its application to optimization with piecewise smooth equality constraints [J] . Ralph D, Xu H Journal of Optimization Theory and Applications . 2005 ,第3期

机译：隐式平滑及其在具有分段平滑相等约束的优化中的应用
2. Implicit Smoothing and Its Application to Optimization with Piecewise Smooth Equality Constraints1 [J] . D. Ralph, H. Xu Journal of Optimization Theory and Applications . 2005 ,第3期

机译：隐式平滑及其在具有分段平滑等式约束的优化中的应用1
3. Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors [J] . Gintare Karolina Dziugaite, Daniel Roy JMLR: Workshop and Conference Proceedings . 2018 ,第1期

机译：Entropy-SGD优化了PAC-Bayes边界的先验：Entropy-SGD的推广性质和数据相关先验
4. SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization [C] . Navjot Singh, Deepesh Data, Jemin George, IEEE International Symposium on Information Theory . 2021

机译：Squarm-SGD：用于分散优化的通信高效动量SGD
5. On computing smooth, singular, and nearly singular integrals on implicitly defined surfaces. [D] . Wilson, Jason R. 2010

机译：在隐式定义的曲面上计算光滑，奇异和几乎奇异的积分。
6. Implicit mechanistic role of the collagen smooth muscle and elastic tissue components in strengthening the air and blood capillaries of the avian lung [O] . John N Maina, Sikiru A Jimoh, Margo Hosie 2010

机译：胶原蛋白平滑肌和弹性组织成分在增强禽肺的空气和血液毛细血管中的隐式机制作用
7. Trend-Smooth: Accelerate Asynchronous SGD by Smoothing Parameters Using Parameter Trends [O] . Guoxin Cui, Jiafeng Guo, Yixing Fan, 2019

机译：趋势平滑：使用参数趋势平滑参数加速异步SGD

Does SGD Implicitly Optimize for Smoothness?

摘要

著录项

相似文献

相关主题

期刊订阅