...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Learning Over-Parametrized Two-Layer Neural Networks beyond NTK
【24h】

Learning Over-Parametrized Two-Layer Neural Networks beyond NTK

机译:学习超出NTK之外的过度参数化的双层神经网络

获取原文
           

摘要

We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $xinmathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{star}(x) = a^{op}|W^{star}x|$, where $ainmathbb{R}^d$ is a nonnegative vector and $W^{star} inmathbb{R}^{dimes d}$ is an orthonormal matrix. We show that an emph{over-parameterized} two layer neural network with ReLU activation, trained by gradient descent from emph{random initialization}, can provably learn the ground truth network with population loss at most $o(1/d)$ in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in $d$, has population loss at least $Omega(1 / d)$.
机译:我们考虑学习双层神经网络的梯度下降的动态。我们假设输入$ x in mathbb {r} ^ d $从高斯分发和$ x $的标签绘制,满足$ f ^ { star}(x)= a ^ { top} | w ^ { star} x | $,其中$ a in mathbb {r} ^ d $是一个非负向量和$ w ^ { star} in mathbb {r} ^ {d times d} $正式矩阵。我们展示了一种带有relu激活的 emph {过度参数化}两个层神经网络,通过梯度下降从 memph {随机初始化}训练,可以在最多$ o(1 / d)上以人口损失可被证明地学习地面真相网络多项式时间与多项式样品。另一方面,我们证明了任何内核方法,包括神经切线内核,以美元为单位的样本数量,具有至少$ omega(1 / d)$的人口损失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号