首页> 外文会议>International Conference on Machine Learning >Why bigger is not always better: on finite and infinite neural networks
【24h】

Why bigger is not always better: on finite and infinite neural networks

机译:为什么更大并不总是更好:在有限和无限的神经网络上

获取原文

摘要

Recent work has argued that neural networks can be understood theoretically by taking the number of channels to infinity, at which point the outputs become Gaussian process (GP) distributed. However, we note that infinite Bayesian neural networks lack a key facet of the behaviour of real neural networks: the fixed kernel, determined only by network hyperparameters, implies that they cannot do any form of representation learning. The lack of representation or equivalently kernel learning leads to less flexibility and hence worse performance, giving a potential explanation for the inferior performance of infinite networks observed in the literature (e.g. Novak et al. 2019). We give analytic results characterising the prior over representations and representation learning in finite deep linear networks. We show empirically that the representations in SOTA architectures such as ResNets trained with SGD are much closer to those suggested by our deep linear results than by the corresponding infinite network. This motivates the introduction of a new class of network: infinite networks with bottlenecks, which inherit the theoretical tractability of infinite networks while at the same time allowing representation learning.
机译:最近的工作据称,通过将频道的数量取向无穷大的频道来理论上可以理解,该输出成为高斯过程(GP)分布的。但是,我们注意到,无限的贝叶斯神经网络缺乏实际神经网络行为的关键方面:仅由网络超参数确定的固定内核意味着它们无法做任何形式的表示学习。缺乏代表性或等同的内核学习导致更少的灵活性,因此性能更差,对文献中观察到的无限网络的劣势性能较差(例如,Novak等,2019)。我们提供分析结果,其特征在于有限的深层线性网络之前的表现和表示学习。我们凭经验展示了SOTA架构中的陈述,如SGD训练的SOTA架构中的表示与我们的深线性结果而言的那些比相应的无限网络更近。这激发了一类新的网络:无限网络,具有瓶颈,其继承了无限网络的理论易行动,同时允许表示学习。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号