Non-convergence of stochastic gradient descent in the training of deep neural networks

Cheridito Patrick; Jentzen Arnulf; Rossmannek Florian

首页> 外文期刊>Journal of complexity >Non-convergence of stochastic gradient descent in the training of deep neural networks

【24h】

Non-convergence of stochastic gradient descent in the training of deep neural networks

机译：深神经网络训练中随机梯度下降的非融合

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown that the approximation error converges to zero if all four parameters are sent to infinity in the right order, we demonstrate in this paper that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough. (C) 2020 The Author(s). Published by Elsevier Inc.

机译：深度神经网络已经成功地在各种应用领域培训，随机梯度下降。但是，没有严格的数学解释为什么这么好起来。具有随机梯度下降的神经网络的培训具有四个不同的离散化参数：（i）网络架构; （ii）培训数据的金额; （iii）梯度步骤的数量; （iv）随机初始化梯度轨迹的数量。虽然可以表明，如果所有四个参数以正确的顺序发送到无穷大，则近似误差会收敛到零，我们在本文中证明了随机梯度下降，如果它们的深度远远大于它们的宽度，随机梯度下降可能会因Relu网络而收敛到。随机初始化的数量不足够快地增加到无限度。（c）2020提交人。 elsevier公司出版

著录项

来源
《Journal of complexity》 |2021年第6期|101540.1-101540.10|共10页
作者
Cheridito Patrick; Jentzen Arnulf; Rossmannek Florian;
展开▼
作者单位

Swiss Fed Inst Technol Dept Math Zurich Switzerland;

Univ Munster Fac Math & Comp Sci Munster Germany;

Swiss Fed Inst Technol Dept Math Zurich Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine learning; Deep neural networks; Stochastic gradient descend; Empirical risk minimization; Non-convergence;

机译：机器学习;深神经网络;随机梯度下降;经验风险最小化;非融合;

相似文献

外文文献
中文文献
专利

1. Accelerating deep neural network training with inconsistent stochastic gradient descent [J] . Wang Linnan, Yang Yi, Min Renqiang, Neural Networks: The Official Journal of the International Neural Network Society . 2017,第期

机译：加速深度神经网络训练，随机梯度下降不一致
2. Stochastic Gradient Descent–Whale Optimization Algorithm-Based Deep Convolutional Neural Network To Crowd Emotion Understanding [J] . Avinash Ratre The Computer journal . 2020,第2CD期

机译：基于随机梯度下降-鲸鱼优化算法的深度卷积神经网络对人群情感理解
3. Stochastic Gradient Descent–Whale Optimization Algorithm-Based Deep Convolutional Neural Network To Crowd Emotion Understanding [J] . Avinash Ratre, Yannis Manolopoulos The Computer Journal . 2020,第1期

机译：基于随机梯度下降 - 鲸鲸优化算法的人群情绪理解的深度卷积神经网络
4. Improving training time of deep neural networkwith asynchronous averaged stochastic gradient descent [C] . You Zhao, Xu Bo International Symposium on Chinese Spoken Language Processing . 2014

机译：异步平均随机梯度下降法缩短了深度神经网络的训练时间
5. An Investigation of Stochastic Gradient Descent Dynamics of Neural Networks [D] . Luo, Victor. 2021

机译：神经网络随机梯度下降动力学研究
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. Non-convergence of stochastic gradient descent in the training of deep neural networks [O] . Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek 2021

机译：深神经网络训练中随机梯度下降的非融合

Non-convergence of stochastic gradient descent in the training of deep neural networks

摘要

著录项

相似文献

相关主题

期刊订阅