首页> 外文会议>DAGM German Conference on Pattern Recognition >Does SGD Implicitly Optimize for Smoothness?
【24h】

Does SGD Implicitly Optimize for Smoothness?

机译:SGD是否隐含地优化了平滑性?

获取原文

摘要

Modern neural networks can easily fit their training set perfectly. Surprisingly, despite being "overfit" in this way, they tend to generalize well to future data, thereby defying the classic bias-variance trade-off of machine learning theory. Of the many possible explanations, a prevalent one is that training by stochastic gradient descent (SGD) imposes an implicit bias that leads it to learn simple functions, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood. In this work, we explore the smoothness conjecture which states that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and we conduct experiments to determine whether SGD indeed implicitly optimizes for these measures. Our findings rule out the possibility that smoothness measures based on first-order derivatives are being implicitly enforced. They are supportive, though, of the smoothness conjecture for measures based on second-order derivatives.
机译:现代神经网络可以很容易地符合完美的培训。令人惊讶的是,尽管以这种方式存在“过度装备”,但它们倾向于概括到未来的数据,从而忽略了机器学习理论的经典偏差差异。在许多可能的解释中,普遍的是通过随机梯度下降(SGD)的训练施加了隐含的偏压,导致它学习简单的功能,并且这些简单的功能概括了很好。然而,这种隐含偏差的具体细节并不是很好地理解。在这项工作中,我们探讨了平滑度猜想,说明SGD隐含地偏向于流畅的学习功能。我们提出了几项措施来形式化平稳概念的平稳概念,我们进行实验以确定SGD是否确实隐含地针对这些措施优化了优化。我们的调查结果排除了基于一阶衍生品的平稳度措施的可能性是隐含强制执行的。但是,它们是支持性的,用于基于二阶衍生物的措施的平滑度猜想。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号