首页> 外文会议>AAAI Conference on Artificial Intelligence >The Goldilocks Zone: Towards Better Understanding of Neural Network Loss Landscapes
【24h】

The Goldilocks Zone: Towards Better Understanding of Neural Network Loss Landscapes

机译:金发姑娘区:更好地了解神经网络损失景观

获取原文

摘要

We explore the loss landscape of fully-connected and convolutional neural networks using random, low-dimensional hyperplanes and hyperspheres. Evaluating the Hessian, H, of the loss function on these hypersurfaces, we observe 1) an unusual excess of the number of positive eigenvalues of H, and 2) a large value of Tr(H)/||H| | at a well defined range of configuration space radii, corresponding to a thick, hollow, spherical shell we refer to as the Goldilocks zone. We observe this effect for fully-connected neural networks over a range of network widths and depths on MNIST and CIFAR-10 datasets with the ReLU and tanh non-linearities, and a similar effect for convolutional networks. Using our observations, we demonstrate a close connection between the Goldilocks zone, measures of local convexity/prevalence of positive curvature, and the suitability of a network initialization. We show that the high and stable accuracy reached when optimizing on random, low-dimensional hypersurfaces is directly related to the overlap between the hypersurface and the Goldilocks zone, and as a corollary demonstrate that the notion of intrinsic dimension is initialization-dependent. We note that common initialization techniques initialize neural networks in this particular region of unusually high convexity/prevalence of positive curvature, and offer a geometric intuition for their success. Furthermore, we demonstrate that initializing a neural network at a number of points and selecting for high measures of local convexity such as Tr(H)/||H | |, number of positive eigenvalues of H, or low initial loss, leads to statistically significantly faster training on MNIST. Based on our observations, we hypothesize that the Goldilocks zone contains an unusually high density of suitable initialization configurations.
机译:我们使用随机,低维超平板和超球探索完全连接和卷积神经网络的损失景观。评估Hessian,H,对这些过度缺陷的损失功能,我们观察到1)一种不寻常的过量超量的阳性特征数,而2)TR(H)/ || H | |在一个明确的配置空间半径范围内,对应于厚,空心的球形壳,我们称为金发姑娘区。我们在带有Relu和Tanh非线性的MNIST和CIFAR-10数据集的一系列网络宽度和深度上遵守完全连接的神经网络的这种效果,以及对卷积网络的类似效果。使用我们的观察,我们展示了金发姑娘区,局部凸起/阳性曲率普及的措施之间的密切连接,以及网络初始化的适用性。我们表明,在随机优化时达到的高稳定的精度与超出表面和金发姑娘区之间的重叠直接相关,并且作为必然表明,内在尺寸的概念依赖于初始化。我们注意到,普通初始化技术初始化该特定区域的神经网络在积极曲率的异常高/普及的特定区域中,并为其成功提供几何直觉。此外,我们证明了在许多点处初始化神经网络,并选择局部凸起的高测量诸如TR(H)/ || H | |,H的正面值数或低初始损失的数量,导致统计上明显更快地训练MNIST。根据我们的观察,我们假设金发姑娘区包含异常高密度的合适初始化配置。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号