首页> 美国卫生研究院文献>other >Doing the Impossible: Why Neural Networks Can Be Trained at All
【2h】

Doing the Impossible: Why Neural Networks Can Be Trained at All

机译:做不可能:为什么可以完全训练神经网络

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As deep neural networks grow in size, from thousands to millions to billions of weights, the performance of those networks becomes limited by our ability to accurately train them. A common naive question arises: if we have a system with billions of degrees of freedom, don't we also need billions of samples to train it? Of course, the success of deep learning indicates that reliable models can be learned with reasonable amounts of data. Similar questions arise in protein folding, spin glasses and biological neural networks. With effectively infinite potential folding/spin/wiring configurations, how does the system find the precise arrangement that leads to useful and robust results? Simple sampling of the possible configurations until an optimal one is reached is not a viable option even if one waited for the age of the universe. On the contrary, there appears to be a mechanism in the above phenomena that forces them to achieve configurations that live on a low-dimensional manifold, avoiding the curse of dimensionality. In the current work we use the concept of mutual information between successive layers of a deep neural network to elucidate this mechanism and suggest possible ways of exploiting it to accelerate training. We show that adding structure to the neural network leads to higher mutual information between layers. High mutual information between layers implies that the effective number of free parameters is exponentially smaller than the raw number of tunable weights, providing insight into why neural networks with far more weights than training points can be reliably trained.
机译:随着深度神经网络的规模不断增长,从数千个权重到数百万个权重,再到数十亿个权重,这些网络的性能受到我们准确训练它们的能力的限制。一个常见的天真问题出现了:如果我们拥有一个拥有数十亿个自由度的系统,我们是否还需要数十亿个样本来对其进行训练?当然,深度学习的成功表明可以通过合理数量的数据来学习可靠的模型。在蛋白质折叠,自旋眼镜和生物神经网络中也出现了类似的问题。借助有效无限的潜在折叠/旋转/接线配置,系统如何找到精确的布置方式以产生有用且稳定的结果?即使已经等待了宇宙的年龄,对可能的配置进行简单采样直到达到最佳配置也不可行。相反,在上述现象中似乎存在一种机制,迫使它们实现存在于低维流形上的配置,从而避免了维数的诅咒。在当前的工作中,我们使用深层神经网络的连续层之间的互信息的概念来阐明这种机制,并提出利用这种机制加速训练的可能方法。我们表明,向神经网络添加结构会导致层之间更高的互信息。层之间的高互信息意味着自由参数的有效数量成指数地小于可调权重的原始数量,从而深入了解为何可以可靠地训练权重远大于训练点的神经网络。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号