首页> 外文期刊>Neural computation >The Stochastic Delta Rule: Faster and More Accurate Deep Learning Through Adaptive Weight Noise
【24h】

The Stochastic Delta Rule: Faster and More Accurate Deep Learning Through Adaptive Weight Noise

机译:随机Delta规则:通过自适应权重噪声进行更快,更准确的深度学习

获取原文
获取原文并翻译 | 示例

摘要

Multilayer neural networks have led to remarkable performance on many kinds of benchmark tasks in text, speech, and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (e.g., saddle points, colinearity, feature discovery) is called Dropout. The Dropout algorithm removes hidden units according to a binomial random variable with probability p prior to each update, creating random “shocks” to the network that are averaged over updates (thus creating weight sharing). In this letter, we reestablish an older parameter search method and show that Dropout is a special case of this more general model, stochastic delta rule (SDR), published originally in 1990. Unlike Dropout, SDR redefines each weight in the network as a random variable with mean μ_(wij) and standard deviation σ_(wij). Each weight random variable is sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights (accumulated in the mean values). Both parameters are updated according to prediction error, thus resulting in weight noise injections that reflect a local history of prediction error and local model averaging. SDR therefore implements a more sensitive local gradientdependent simulated annealing per weight converging in the limit to a Bayes optimal network. We run tests on standard benchmarks (CIFAR and ImageNet) using a modified version of DenseNet and show that SDR outperforms standard Dropout in top-5 validation error by approximately 13% with DenseNet-BC 121 on ImageNet and find various validation error improvements in smaller networks. We also show that SDR reaches the same accuracy that Dropout attains in 100 epochs in as few as 40 epochs, as well as improvements in training error by as much as 80%.
机译:多层神经网络已在文本,语音和图像处理中的许多基准任务上产生了卓越的性能。已知分层模型中的非线性参数估计会出现过度拟合和错误指定的情况。解决这些估计和相关问题(例如,鞍点,共线性,特征发现)的一种方法称为Dropout。 Dropout算法会在每次更新之前根据概率为p的二项式随机变量删除隐藏的单元,从而对网络产生随机的“冲击”,将其平均化为更新(从而创建权重分配)。在这封信中,我们重新建立了一种较旧的参数搜索方法,并显示出Dropout是这种更通用的模型(随机增量规则(SDR))的特例,该模型最初于1990年发布。与Dropout不同,SDR将网络中的每个权重重新定义为随机均值μ_(wij)和标准偏差σ_(wij)的变量。在每次正向激活中对每个权重随机变量进行采样,从而创建具有共享权重(以平均值累计)的指数网络数的潜在网络。两个参数均根据预测误差进行更新,从而导致权重噪声注入反映了预测误差的局部历史和局部模型平均。因此,SDR实现了一个更敏感的局部权重,该模拟权重取决于收敛于贝叶斯最佳网络极限的模拟退火。我们使用DenseNet的修改版在标准基准(CIFAR和ImageNet)上运行测试,结果表明,在ImageNet上使用DenseNet-BC 121,SDR在top-5验证错误中的性能比标准Dropout高出大约13%,并发现较小网络中的各种验证错误得到了改进。我们还显示,SDR的准确度与Dropout在40个纪元内的100个纪元中所达到的精度相同,并且训练错误的改善率高达80%。

著录项

  • 来源
    《Neural computation》 |2020年第5期|1018-1032|共15页
  • 作者单位

    Rotman Research Institute Baycrest Health Sciences Toronto ON M6A 2E1 Canada;

    Rutgers University Brain Imaging Center Newark NJ 07103 U.S.A.;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号