首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima
【24h】

Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima

机译:梯度下降学习了一层隐藏的CNN:不要害怕虚假的局部最小值

获取原文
       

摘要

We consider the problem of learning an one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation function, i.e., $f(Z; w, a) = sum_j a_jsigma(w^op Z_j)$, in which both the convolutional weights $w$ and the output weights $a$ are parameters to be learned. We prove that with Gaussian input $mathbf{Z}$ there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, starting from randomly initialized weights, gradient descent with weight normalization can still be proven to recover the true parameters with constant probability (which can be boosted to probability $1$ with multiple restarts). We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.
机译:我们考虑学习具有非重叠卷积层和ReLU激活函数的单层神经网络的问题,即$ f(Z; w,a)= sum_j a_j sigma(w ^ top Z_j)$ ,其中卷积权重$ w $和输出权重$ a $都是要学习的参数。我们证明高斯输入$ mathbf {Z} $有一个伪局部极小值。令人惊讶的是,在存在伪局部最小化器的情况下,从随机初始化的权重开始,仍可以证明具有权重归一化的梯度下降能够以恒定的概率恢复真实参数(可以通过多次重新启动将其提升为概率$ 1 $)。我们还表明,以恒定的概率,相同的过程也可以收敛到虚假的局部最小值,表明局部最小值在梯度下降的动力学中起着重要作用。此外,定量分析表明,梯度下降动力学有两个阶段:起步缓慢,但经过多次迭代收敛快得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号