...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Kernel and Rich Regimes in Overparametrized Models
【24h】

Kernel and Rich Regimes in Overparametrized Models

机译:过度分化模型中的内核和丰富的制度

获取原文
           

摘要

A recent line of work studies overparametrized neural networks in the “kernel regime,” i.e.?when during training the network behaves as a kernelized linear predictor, and thus, training with gradient descent has the effect of finding the corresponding minimum RKHS norm solution. This stands in contrast to other studies which demonstrate how gradient descent on overparametrized networks can induce rich implicit biases that are not RKHS norms. Building on an observation by citet{chizat2018note}, we show how the extbf{extit{scale of the initialization}} controls the transition between the “kernel” (aka lazy) and “rich” (aka active) regimes and affects generalization properties in multilayer homogeneous models. We provide a complete and detailed analysis for a family of simple depth-$D$ linear networks that exhibit an interesting and meaningful transition between the kernel and rich regimes, and highlight an interesting role for the emph{width} of the models. We further demonstrate this transition empirically for matrix factorization and multilayer non-linear networks.
机译:最近的工作研究所在“内核制度”中通过分化的神经网络进行了分化的神经网络,即在训练期间,网络行为作为核化线性预测器,因此,具有梯度下降的训练具有找到相应的最小RKHS规范解决方案的效果。这与其他研究形成鲜明对比,这些研究表明如何在过度分化的网络上的梯度下降如何诱导非RKHS规范的丰富隐含偏差。通过 citet {chizat2018note}的观察,我们展示了 textbf { textit {初始化}}}如何控制“内核”(aka lazy)和“富人”(AKA Active)制度之间的转换并影响多层均匀模型中的概括特性。我们为一个简单的深度为D $线性网络提供了完整和详细的分析,该网络在内核和丰富的制度之间表现出有趣和有意义的转换,并为模型的 emph {width}突出显示一个有趣的角色。我们进一步证明了矩阵分解和多层非线性网络的经验验证这一转变。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号