首页> 外文期刊>Journal of complexity >Non-convergence of stochastic gradient descent in the training of deep neural networks
【24h】

Non-convergence of stochastic gradient descent in the training of deep neural networks

机译:深神经网络训练中随机梯度下降的非融合

获取原文
获取原文并翻译 | 示例
           

摘要

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent. However, there exists no rigorous mathematical explanation why this works so well. The training of neural networks with stochastic gradient descent has four different discretization parameters: (i) the network architecture; (ii) the amount of training data; (iii) the number of gradient steps; and (iv) the number of randomly initialized gradient trajectories. While it can be shown that the approximation error converges to zero if all four parameters are sent to infinity in the right order, we demonstrate in this paper that stochastic gradient descent fails to converge for ReLU networks if their depth is much larger than their width and the number of random initializations does not increase to infinity fast enough. (C) 2020 The Author(s). Published by Elsevier Inc.
机译:深度神经网络已经成功地在各种应用领域培训,随机梯度下降。 但是,没有严格的数学解释为什么这么好起来。 具有随机梯度下降的神经网络的培训具有四个不同的离散化参数:(i)网络架构; (ii)培训数据的金额; (iii)梯度步骤的数量; (iv)随机初始化梯度轨迹的数量。 虽然可以表明,如果所有四个参数以正确的顺序发送到无穷大,则近似误差会收敛到零,我们在本文中证明了随机梯度下降,如果它们的深度远远大于它们的宽度,随机梯度下降可能会因Relu网络而收敛到。 随机初始化的数量不足够快地增加到无限度。 (c)2020提交人。 elsevier公司出版

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号