首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Quasi Black Hole Effect of Gradient Descent in Large Dimension: Consequence on Neural Network Learning
【24h】

Quasi Black Hole Effect of Gradient Descent in Large Dimension: Consequence on Neural Network Learning

机译:大尺度梯度下降的拟黑洞效应:对神经网络学习的影响

获取原文

摘要

The gradient descent to a local minimum is the key ingredient of deep neural networks learning techniques. We consider a function Lm(.) in dimension n with a random set of m absolute minima. When log m = o(n), we show that a gradient descent from an initial random point quasi always ends on a unique local minimum approximately at the centroid of the absolute minima. This fake minimum acts like an absorbing node, but its value by function Lm(.) can be far above the values obtained by Lm(.) on the absolute minima and sometimes gives very bad coefficients for the neural network. Fortunately in most cases the fake minimum leads to a neural network with not so bad prediction, with an error rate of order n-1/4. The only way to escape the fake minimum is to start a new gradient descent from a new random point and we show that finding a good initial point takes in average time which is at least proportional to ebn/mn2 for some b > 0.
机译:梯度下降到局部最小值是深度神经网络学习技术的关键成分。我们考虑一个函数l m (。)在尺寸n中,随机组绝对最小值。当log m = o(n)时,我们表明,从初始随机点准则梯度下降始终以绝对最小值的质心大致结束唯一的局部最小值。这种假的最小值类似于吸收节点,但其值是函数l m (。)远远高于L由L获得的值 m (。)在绝对最小值上,有时给出神经网络的非常糟糕的系数。幸运的是在大多数情况下,假的最小值导致神经网络,没有那么糟糕的预测,订单错误率n -1/4 。逃避虚假最小值的唯一方法是从一个新的随机点开始一个新的渐变下降,我们表明找到一个良好的初始点,其平均时间至少与e成比例 bn / mn. 2 对于一些b> 0。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号