【24h】

Adaptive regularization

机译:自适应正规化

获取原文

摘要

Regularization, e.g., in the form of weight decay, is important for training and optimization of neural network architectures. In this work the authors provide a tool based on asymptotic sampling theory, for iterative estimation of weight decay parameters. The basic idea is to do a gradient descent in the estimated generalization error with respect to the regularization parameters. The scheme is implemented in the authors' Designer Net framework for network training and pruning, i.e., is based on the diagonal Hessian approximation. The scheme does not require essential computational overhead in addition to what is needed for training and pruning. The viability of the approach is demonstrated in an experiment concerning prediction of the chaotic Mackey-Glass series. The authors find that the optimized weight decays are relatively large for densely connected networks in the initial pruning phase, while they decrease as pruning proceeds.
机译:正则化,例如,以体重衰减的形式,对于神经网络架构的培训和优化是重要的。在这项工作中,作者提供了一种基于渐近抽样理论的工具,用于重量衰减参数的迭代估计。基本思想是在关于正则化参数的估计泛化误差中进行梯度下降。该方案在作者的设计者网络框架中实施,用于网络训练和修剪,即,基于对角线Hessian近似。除了培训和修剪所需的内容之外,该方案不需要基本的计算开销。该方法的可行性在有关混沌麦克乐玻璃系列的预测的实验中进行了证明。作者发现优化的重量衰减对于初始修剪阶段密集连接的网络相对较大,而它们会降低作为修剪的所得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号