首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization
【24h】

Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

机译:通过有效的SVD参数化实现深度神经网络的稳定梯度

获取原文
           

摘要

Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to easily solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We note that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it potentially makes the optimization process easier. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on MNIST and Penn Tree Bank data sets.
机译:消失和爆炸梯度是训练深度神经网络的两个主要障碍,尤其是在捕获递归神经网络(RNN)的远距离依赖性时。在本文中,我们介绍了RNN转换矩阵的有效参数化,使我们能够稳定其训练中出现的梯度。具体来说,我们通过其奇异值分解(SVD)对过渡矩阵进行参数化,这使我们可以显式跟踪和控制其奇异值。我们通过使用数值线性代数中常用的工具(即用于表示SVD中出现的正交矩阵的Householder反射器)来获得效率。通过显式控制奇异值,我们提出的Spectral-RNN方法使我们能够轻松解决爆炸梯度问题,并且观察到它从经验上解决了梯度消失问题。我们注意到SVD参数化可用于任何矩形权重矩阵,因此可以轻松地扩展到任何深度神经网络,例如多层感知器。从理论上讲,我们证明了参数化不会失去任何表达能力,并说明了它如何使优化过程变得更容易。我们广泛的实验结果还表明,所提出的框架收敛更快,并且具有良好的概括性,尤其是在捕获远程依赖项方面,如综合添加和复制任务以及MNIST和Penn Tree Bank数据集所示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号