首页> 外文会议>International Conference on Machine Learning >Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization
【24h】

Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

机译:通过高效的SVD参数化稳定深神经网络的梯度

获取原文

摘要

Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition (SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to provably solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We note that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it controls generalization of RNN for the classification task. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on the MNIST and Penn Tree Bank data sets.
机译:消失和爆炸梯度是训练深神经网络的两个主要障碍,特别是在捕获经常性神经网络(RNN)中的长距离依赖性。在本文中,我们介绍了RNN的过渡矩阵的高效参数化,其允许我们稳定其训练中产生的梯度。具体地,我们通过其奇异值分解(SVD)来参数化转换矩阵,其允许我们显式跟踪和控制其奇异值。我们通过使用数值线性代数中常见的工具来实现效率,即用于表示SVD中出现的正交矩阵的家庭分子反射器。通过明确地控制奇异值,我们提出的谱RNN方法允许我们可证明爆炸梯度问题,并且我们观察到它在很大程度上使消失的梯度问题解决了消失的梯度问题。我们注意到,SVD参数化可以用于任何矩形权重矩阵,因此它可以很容易地扩展到任何深神经网络,例如多层Perceptron。从理论上讲,我们证明我们的参数化不会丢失任何富有表现力,并展示它如何控制RNN的泛化进行分类任务。我们广泛的实验结果还表明,所提出的框架会聚得更快,并且具有良好的泛化,特别是在捕获长距离依赖性时,如合成添加和复制任务所示,以及Mnist和Penn树库数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号