Understanding the Disharmony between Weight Normalization Family and Weight Decay

机译：了解重量标准化家庭和体重衰减之间的不和谐

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight W to W', which makes W' independent to the magnitude of W. Surprisingly, W must be decayed during gradient descent, otherwise we will observe a severe under-fitting problem, which is very counter-intuitive since weight decay is widely known to prevent deep networks from over-fitting. Moreover, if we substitute (e.g., weight normalization) W' = w/||w|| in the original loss function E_i L(f(x_i; W'), y_i) + 1/2 λ||W'||~2, it is observed that the regularization term 1/2λ||W'||~2 will be canceled as a constant 1/2λ in the optimization objective. Therefore, to decay W, we need to explicitly append: 1/2λ||W||~2. In this paper, we theoretically prove that 1/2λ||W||~2 improves optimization only by modulating the effective learning rate and fairly has no influence on generalization when the weight normalization family is compositely employed. Furthermore, we also expose several serious problems when introducing weight decay term to weight normalization family, including the missing of global minimum, training instability and sensitivity of initialization. To address these problems, we propose an Adaptive Weight Shrink (AWS) scheme, which gradually shrinks the weights during optimization by a dynamic coefficient proportional to the magnitude of the parameter. This simple yet effective method appropriately controls the effective learning rate, which significantly improves the training stability and makes optimization more robust to initialization.

机译：快速收敛的优点和重量标准化家庭的潜在更好的性能近年来越来越受到关注。这些方法使用标准化或标准化，使重量W变为W'，这使得与W的大小无关。令人惊讶的是，W必须在梯度下降期间腐烂，否则我们将观察到一个严重的贴合问题，这是非常的由于重量衰减，因此众所周知，因此众所周知，因此可以防止深度网络过度拟合。此外，如果我们替换（例如，重量归一化）w'= w / || w ||在原始损失函数E_I L（f（x_i; w'）中，y_i）+1/2λ|| w'||〜2，它被观察到正则化术语1 /2λ|| w'||〜2在优化目标中将被取消为常数1/2λ。因此，要衰减W，我们需要明确地附加：1 /2λ|| w ||〜2。在本文中，我们理论上证明了1 /2λ|| W ||〜2仅通过调节有效学习率并且当综合重量标准化家庭被采用时对泛化没有影响。此外，我们还在将重量衰减术语引入重量标准化家庭时揭示了几个严重的问题，包括遗失的全局最小值，培训不稳定和初始化的敏感性。为了解决这些问题，我们提出了一种自适应重量收缩（AWS）方案，其在优化期间通过与参数的大小成比例的动态系数逐渐缩小重量。这种简单但有效的方法适当地控制了有效的学习率，这显着提高了训练稳定性，并使优化更加强大地初始化。

著录项

来源
《AAAI Conference on Artificial Intelligence》|2020年|4667-5453p|共8页
会议地点
作者
Xiang Li; Shuo Chen; Jian Yang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Field normalized citation rates, field normalized journal impact and Norwegian weights for allocation of university research funds [J] . Per Ahlgren, Cristian Colliander, Olle Persson Scientometrics . 2012,第3期

机译：实地归一化引文率，实地归一化期刊影响和挪威分配大学研究经费的权重
2. Normalized Brain Weight as a Characteristic of the Relationship between the Body and Brain Weights in Birds [J] . V.S. Smirnov Russian journal of ecology . 2002,第5期

机译：归一化脑重量作为鸟类身体与脑重量之间关系的特征
3. Family entropy: understanding the organization of the family home environment and impact on child health behaviors and weight [J] . Carolyn R. Bates, Amy M. Bohnert, Joanna Buscemi, Translational behavioral medicine. . 2019,第3期

机译：家庭熵：了解家庭住宅环境的组织，对儿童健康行为和体重的影响
4. Understanding the Disharmony between Weight Normalization Family and Weight Decay [C] . Xiang Li, Shuo Chen, Jian Yang AAAI Conference on Artificial Intelligence . 2020

机译：了解重量标准化家庭和体重衰减之间的不和谐
5. The Role of Nutritional Intake in Weight and Depressive Symptomatology on Children Participating in Family-Based Therapy for Weight Reduction. [D] . Ampolos, Lauren. 2013

机译：营养摄入在儿童体重减轻和抑郁症状方面的作用，参与了家庭减肥的儿童。
6. Family entropy: understanding the organization of the family home environment and impact on child health behaviors and weight [O] . Carolyn R Bates, Amy M Bohnert, Joanna Buscemi, 2019

机译：家庭熵：了解家庭家庭环境的组织及其对儿童健康行为和体重的影响
7. Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift [O] . Xiang Li, Shuo Chen, Xiaolin Hu, 2019

机译：通过方差移位理解辍学和批量标准化之间的不和谐

Understanding the Disharmony between Weight Normalization Family and Weight Decay

摘要

著录项

相似文献

相关主题

期刊订阅