ACCELERATING HESSIAN-FREE OPTIMIZATION FOR DEEP NEURAL NETWORKS BY IMPLICIT PRECONDITIONING AND SAMPLING

机译：通过隐含的预处理和抽样加速对深神经网络的无Hessian的优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hessian-free training has become a popular parallel second order optimization technique for Deep Neural Network training. This study aims at speeding up Hessian-free training, both by means of decreasing the amount of data used for training, as well as through reduction of the number of Krylov subspace solver iterations used for implicit estimation of the Hessian. In this paper, we develop an L-BFGS based preconditioning scheme that avoids the need to access the Hessian explicitly. Since L-BFGS cannot be regarded as a fixed-point iteration, we further propose the employment of flexible Krylov subspace solvers that retain the desired theoretical convergence guarantees of their conventional counterparts. Second, we propose a new sampling algorithm, which geometrically increases the amount of data utilized for gradient and Krylov subspace iteration calculations. On a 50-hr English Broadcast News task, we find that these methodologies provide roughly a 1.5x speed-up, whereas, on a 300-hr Switchboard task, these techniques provide over a 2.3x speedup, with no loss in WER. These results suggest that even further speed-up is expected, as problems scale and complexity grows.

机译：Hessian-Free培训已成为深度神经网络培训的流行并行二阶优化技术。本研究旨在加速Hessian的无培训培训，无论是降低用于培训的数据量，还可以通过减少用于隐含粗麻布估计的Krylov子空间溶剂迭代的数量。在本文中，我们开发了基于L-BFGS的预处理方案，避免了明确访问Hessian的必要性。由于L-BFG不能被视为定点迭代，我们进一步提出了柔性Krylov子空间溶剂，该载体保留了其常规对应物的所需理论收敛保证。其次，我们提出了一种新的采样算法，其几何上增加了用于梯度和Krylov子空间迭代计算的数据量。在50小时英语广播新闻任务中，我们发现这些方法提供了大约1.5倍的加速，而在300小时的交换机纸板任务上，这些技术提供了2.3倍的加速，不含WER损失。这些结果表明，预期进一步加速，因为问题规模和复杂性增长。

著录项

来源
《Workshop on Automatic Speech Recognition and Understanding》|2013年||共6页
会议地点
作者
Tara N. Sainath; Lior Horesh; Brian Kingsbury; Aleksandr Y. Aravkin; Bhuvana Ramabhadran;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912.3-532;
关键词
training; through reduction; desired theoretical;

机译：培训;通过减少;所需的理论;

相似文献

外文文献
中文文献
专利

1. A Hessian-Free Gradient Flow (HFGF) method for the optimisation of deep learning neural networks [J] . Sushen Zhang, Ruijuan Chen, Wenyu Du, Computers & Chemical Engineering . 2020,第Octa4期

机译：一种无奇皮人的梯度流动（HFGF）方法，用于优化深学习神经网络
2. Accelerating MR Image Acquisition with Sparse Sampling And Integration of Self-Attention Into a Deep Convolutional Neural Network [J] . Wu Y., Ma Y., Du J., Medical Physics . 2019,第6期

机译：加速MR图像采集与稀疏抽样和自我关注集成到深度卷积神经网络中
3. Ensemble-Average Representation of Pt Clusters in Conditions of Catalysis Accessed through GPU Accelerated Deep Neural Network Fitting Global Optimization [J] . Zhai Huanchen, Alexandrova Anastassia N. Journal of chemical theory and computation: JCTC . 2016,第12期

机译：通过GPU加速的深度神经网络拟合全局优化访问的催化条件下Pt团簇的平均积分表示
4. Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling [C] . Sainath Tara N., Horesh Lior, Kingsbury Brian, IEEE Workshop on Automatic Speech Recognition and Understanding . 2013

机译：通过隐式预处理和采样来加速深度神经网络的无Hessian优化
5. Systematic Analysis of Deep Neural Networks: Retrieving Sensitive Samples via SMT Solving [D] . Docena, Amel Nestor B. 2020

机译：深度神经网络的系统分析：通过SMT求解检测敏感样本
6. A New Method to Improve the Performance of Deep Neural Networks in Detecting P300 Signals: Optimizing Curvature of Error Surface Using Genetic Algorithm [O] . Seyed Vahab Shojaedini, Sajedeh Morabbi, Mohamad Reza Keyvanpour 2021

机译：一种提高P300信号深神经网络性能的新方法：使用遗传算法优化误差表面曲率
7. Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling [O] . Sainath, Tara N., Horesh, Lior, Kingsbury, Brian, 2013

机译：利用maTLaB加速深度神经网络的无Hessian优化隐式预处理和抽样

ACCELERATING HESSIAN-FREE OPTIMIZATION FOR DEEP NEURAL NETWORKS BY IMPLICIT PRECONDITIONING AND SAMPLING

摘要

著录项

相似文献

相关主题

期刊订阅