On the interplay of network structure and gradient convergence in deep learning

机译：深度学习中网络结构和梯度收敛的相互作用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The regularization and output consistency behavior of dropout and layer-wise pretraining for learning deep networks have been fairly well studied. However, our understanding of how the asymptotic convergence of backpropagation in deep architectures is related to the structural properties of the network and other design choices (like denoising and dropout rate) is less clear at this time. An interesting question one may ask is whether the network architecture and input data statistics may guide the choices of learning parameters and vice versa. In this work, we explore the association between such structural, distributional and learnability aspects vis-à-vis their interaction with parameter convergence rates. We present a framework to address these questions based on convergence of backpropagation for general nonconvex objectives using first-order information. This analysis suggests an interesting relationship between feature denoising and dropout. Building upon these results, we obtain a setup that provides systematic guidance regarding the choice of learning parameters and network sizes that achieve a certain level of convergence (in the optimization sense) often mediated by statistical attributes of the inputs. Our results are supported by a set of experimental evaluations as well as independent empirical observations reported by other groups.

机译：对于学习深度网络的辍学和逐层预训练的正则化和输出一致性行为已经进行了很好的研究。但是，目前我们对在深层结构中反向传播的渐近收敛与网络的结构特性以及其他设计选择（例如降噪和丢失率）之间的关系的了解尚不清楚。一个有趣的问题可能是，网络体系结构和输入数据统计信息是否可以指导学习参数的选择，反之亦然。在这项工作中，我们探索了这些结构，分布和可学习性方面之间的关联，以及它们与参数收敛速度之间的相互作用。我们提出了一个框架，使用一阶信息基于对一般非凸目标的反向传播收敛来解决这些问题。该分析表明特征降噪与丢失之间存在有趣的关系。在这些结果的基础上，我们获得了一种设置，该设置提供了有关选择学习参数和网络规模的系统指导，这些参数和网络规模通常通过输入的统计属性来实现一定程度的收敛（在优化意义上）。我们的结果得到了一组实验评估以及其他小组报告的独立经验观察的支持。

著录项

来源
《Annual Allerton Conference on Communication, Control, and Computing》|2016年|488-495|共8页
会议地点
作者
Vamsi K. Ithapu; Sathya N. Ravi; Vikas Singh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Convergence; Backpropagation; Machine learning; Noise reduction; Training; Optimization; Transforms;

机译：收敛;反向传播;机器学习;降噪;训练;优化;变换;

相似文献

外文文献
中文文献
专利

1. Non-convergence of stochastic gradient descent in the training of deep neural networks [J] . Cheridito Patrick, Jentzen Arnulf, Rossmannek Florian Journal of complexity . 2021,第Juna期

机译：深神经网络训练中随机梯度下降的非融合
2. Deterministic convergence of complex mini-batch gradient learning algorithm for fully complex-valued neural networks [J] . Zhang Huisheng, Zhang Ying, Zhu Shuai, Neurocomputing . 2020,第Sepa24期

机译：复杂微批梯度学习算法的确定性收敛性全复数神经网络
3. Convergence of a modified gradient-based learning algorithm with penalty for single-hidden-layer feed-forward networks [J] . Wang Jian, Zhang Bingjie, Sang Zhaoyang, Neural computing & applications . 2020,第7期

机译：基于修正的基于梯度的学习算法与单隐层前馈网络的惩罚融合
4. On the interplay of network structure and gradient convergence in deep learning [C] . Vamsi K. Ithapu, Sathya N. Ravi, Vikas Singh Annual Allerton Conference on Communication, Control, and Computing . 2016

机译：论网络结构与深度学习中梯度融合的相互作用
5. Geometric Properties of the Gradient of Loss Functions in Discriminant Deep Neural Networks [D] . Li, Li. 2021

机译：判别深神经网络中损耗函数梯度的几何特性
6. Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks [O] . Qinwei Fan, Wei Wu, Jacek M. Zurada -1

机译：具有平滑正则化和自适应动量的神经网络批次梯度学习的收敛性
7. On the interplay of network structure and gradient convergence in deep learning [O] . Ithapu, Vamsi K, Ravi, Sathya N, Singh, Vikas 2017

机译：论深层网络结构与梯度收敛的相互作用学习

On the interplay of network structure and gradient convergence in deep learning

摘要

著录项

相似文献

相关主题

期刊订阅