首页> 外文OA文献 >Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift
【2h】

Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift

机译:通过方差移位理解辍学和批量标准化之间的不和谐

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper first answers the question "why do the two most powerfultechniques Dropout and Batch Normalization (BN) often lead to a worseperformance when they are combined together?" in both theoretical andstatistical aspects. Theoretically, we find that Dropout would shift thevariance of a specific neural unit when we transfer the state of that networkfrom train to test. However, BN would maintain its statistical variance, whichis accumulated from the entire learning procedure, in the test phase. Theinconsistency of that variance (we name this scheme as "variance shift") causesthe unstable numerical behavior in inference that leads to more erroneouspredictions finally, when applying Dropout before BN. Thorough experiments onDenseNet, ResNet, ResNeXt and Wide ResNet confirm our findings. According tothe uncovered mechanism, we next explore several strategies that modifiesDropout and try to overcome the limitations of their combination by avoidingthe variance shift risks.
机译:本文首先回答了这个问题“为什么这两个大多数PowerCechniques辍学和批量标准化(BN)通常会导致它们在一起组合时的胜利形​​态?”在理论和统计方面。从理论上讲,当我们将该网络的状态转移到测试时,我们发现辍学将转变特定神经单元的特定神经单元的转变。然而,BN将维持其统计方差,在测试阶段中从整个学习过程中累积的统计方差。该方差的神经元(我们将该方案命名为“方差移位”)在推理中引起不稳定的数值行为,这导致了在BN之前施加辍学时的漏洞。彻底实验ondensenet,Reset,Resnext和Resid Resnet确认我们的研究结果。根据未覆盖的机制,我们下次探讨了修改措施,并尝试通过避免方差移位风险来克服它们组合的局限性的几个策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号