Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima

Simon Du; Jason Lee; Yuandong Tian; Aarti Singh; Barnabas Poczos

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima

【24h】

Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima

机译：梯度下降学习了一层隐藏的CNN：不要害怕虚假的局部最小值

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of learning an one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation function, i.e., $f(Z; w, a) = sum_j a_jsigma(w^op Z_j)$, in which both the convolutional weights $w$ and the output weights $a$ are parameters to be learned. We prove that with Gaussian input $mathbf{Z}$ there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, starting from randomly initialized weights, gradient descent with weight normalization can still be proven to recover the true parameters with constant probability (which can be boosted to probability $1$ with multiple restarts). We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.

机译：我们考虑学习具有非重叠卷积层和ReLU激活函数的单层神经网络的问题，即$ f（Z; w，a）= sum_j a_j sigma（w ^ top Z_j）$ ，其中卷积权重$ w $和输出权重$ a $都是要学习的参数。我们证明高斯输入$ mathbf {Z} $有一个伪局部极小值。令人惊讶的是，在存在伪局部最小化器的情况下，从随机初始化的权重开始，仍可以证明具有权重归一化的梯度下降能够以恒定的概率恢复真实参数（可以通过多次重新启动将其提升为概率$ 1 $）。我们还表明，以恒定的概率，相同的过程也可以收敛到虚假的局部最小值，表明局部最小值在梯度下降的动力学中起着重要作用。此外，定量分析表明，梯度下降动力学有两个阶段：起步缓慢，但经过多次迭代收敛快得多。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共10页
作者
Simon Du; Jason Lee; Yuandong Tian; Aarti Singh; Barnabas Poczos;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
入库时间 2022-08-18 15:56:25

相似文献

外文文献
中文文献
专利

1. Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima [J] . Simon Du, Jason Lee, Yuandong Tian, JMLR: Workshop and Conference Proceedings . 2018,第4期

机译：梯度下降学习了一层隐藏的CNN：不要害怕虚假的局部最小值
2. Gradient-only approaches to avoid spurious local minima in unconstrained optimization [J] . Daniel Nicolas Wilke, Schalk Kok, Johannes Arnoldus Snyman, Optimization and Engineering . 2013,第2期

机译：在无约束优化中避免伪局部极小值的仅梯度方法
3. LOCAL MINIMA ESCAPE TRANSIENTS BY STOCHASTIC GRADIENT DESCENT ALGORITHMS IN BLIND ADAPTIVE EQUALIZERS [J] . Frater MR., Johnson CR., Bitmead RR. Automatica . 1995,第4期

机译：盲自适应均衡器中随机梯度下降算法的局部最小值瞬变
4. Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks [C] . Cetin, B.C., Burdick, . 1993

机译：全局下降取代了梯度下降，以避免在人工神经网络学习中出现局部极小问题
5. On the Ability of Gradient Descent to Learn Neural Networks [D] . Li, Yuanzhi. 2018

机译：梯度下降学习神经网络的能力
6. Don’t Be Afraid to Fail Because You Can Learn From It! How Intrinsic Motivation Leads to Enhanced Self-Development and Benevolent Leadership as a Boundary Condition [O] . Qiwei Zhou, Jih-Yu Mao, Fangcheng Tang 2020

机译：不要害怕失败因为您可以从中学到东西！内在动机如何导致自我发展和善于领导的边界条件
7. Gradient-only approaches to avoid spurious local minima in unconstrained optimization [O] . Groenwold, Albert A., Kok, Schalk, Snyman, Johannes Arnoldus, 2013

机译：仅梯度方法避免无约束优化中的虚假局部最小值

Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima

摘要

著录项

相似文献

相关主题

期刊订阅