Quasi Black Hole Effect of Gradient Descent in Large Dimension: Consequence on Neural Network Learning

机译：大尺度梯度下降的拟黑洞效应：对神经网络学习的影响

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The gradient descent to a local minimum is the key ingredient of deep neural networks learning techniques. We consider a function L_m(.) in dimension n with a random set of m absolute minima. When log m = o(n), we show that a gradient descent from an initial random point quasi always ends on a unique local minimum approximately at the centroid of the absolute minima. This fake minimum acts like an absorbing node, but its value by function L_m(.) can be far above the values obtained by L_m(.) on the absolute minima and sometimes gives very bad coefficients for the neural network. Fortunately in most cases the fake minimum leads to a neural network with not so bad prediction, with an error rate of order n^-1/4. The only way to escape the fake minimum is to start a new gradient descent from a new random point and we show that finding a good initial point takes in average time which is at least proportional to e^bn/mn² for some b > 0.

机译：梯度下降到局部最小值是深度神经网络学习技术的关键成分。我们考虑一个函数l _{m
（。）在尺寸n中，随机组绝对最小值。当log m = o（n）时，我们表明，从初始随机点准则梯度下降始终以绝对最小值的质心大致结束唯一的局部最小值。这种假的最小值类似于吸收节点，但其值是函数l
_{m
（。）远远高于L由L获得的值
_{m
（。）在绝对最小值上，有时给出神经网络的非常糟糕的系数。幸运的是在大多数情况下，假的最小值导致神经网络，没有那么糟糕的预测，订单错误率n
^{-1/4
。逃避虚假最小值的唯一方法是从一个新的随机点开始一个新的渐变下降，我们表明找到一个良好的初始点，其平均时间至少与e成比例
^{bn
/ mn.
^{2
对于一些b> 0。}}}}}}

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|8365-8369|共5页
会议地点
作者
Anne Bouillard; Philippe Jacquet;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
gradient methods; learning (artificial intelligence); neural nets;

机译：梯度法;学习（人工智能）;神经网络;

相似文献

外文文献
中文文献
专利

1. Comparing the performance of neural networks developed by using Levenberg-Marquardt and Quasi-Newton with the gradient descent algorithm for modelling a multiple response grinding process [J] . Indrajit Mukherjee, Srikanta Routroy Expert Systems with Application . 2012,第3期

机译：将Levenberg-Marquardt和Quasi-Newton开发的神经网络的性能与梯度下降算法进行建模的多响应磨削过程进行比较
2. Gradient descent learning for quaternionic Hopfield neural networks [J] . Kobayashi Masaki Neurocomputing . 2017,第octa18期

机译：四元Hopfield神经网络的梯度下降学习
3. Fractional-order gradient descent learning of BP neural networks with Caputo derivative [J] . Wang Jian, Wen Yanqing, Gou Yida, Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：大型衍生物BP神经网络的分数级梯度下降学习
4. Quasi Black Hole Effect of Gradient Descent in Large Dimension: Consequence on Neural Network Learning [C] . Anne Bouillard, Philippe Jacquet IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：大维梯度下降的准微黑洞效应：神经网络学习的结果
5. Statistical Learning with Neural Networks Trained by Gradient Descent [D] . Frei, Spencer. 2021

机译：与梯度下降训练的神经网络统计学习
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. Learning dynamics of gradient descent optimization in deep neural networks [O] . Wei Wu, Xiaoyuan Jing, Wencai Du, 2021

机译：深神经网络梯度下降优化的学习动态

Quasi Black Hole Effect of Gradient Descent in Large Dimension: Consequence on Neural Network Learning

摘要

著录项

相似文献

相关主题

期刊订阅