...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >A Corrective View of Neural Networks: Representation, Memorization and Learning
【24h】

A Corrective View of Neural Networks: Representation, Memorization and Learning

机译:神经网络的矫正视图:代表,记忆和学习

获取原文
           

摘要

We develop a emph{corrective mechanism} for neural network approximation: the total available non-linear units are divided into multiple groups and the first group approximates the function under consideration, the second approximates the error in approximation produced by the first group and corrects it, the third group approximates the error produced by the first and second groups together and so on. This technique yields several new representation and learning results for neural networks: 1. Two-layer neural networks in the random features regime (RF) can memorize arbitrary labels for $n$ arbitrary points in $mathbb{R}^d$ with $ilde{O}(frac{n}{heta^4})$ ReLUs, where $heta$ is the minimum distance between two different points. This bound can be shown to be optimal in $n$ up to logarithmic factors. 2. Two-layer neural networks with ReLUs and smoothed ReLUs can represent functions with an error of at most $epsilon$ with $O(C(a,d)epsilon^{-1/(a+1)})$ units for $a in mathbb{N}cup{0}$ when the function has $Theta(ad)$ bounded derivatives. In certain cases $d$ can be replaced with effective dimension $q ll d$. Our results indicate that neural networks with only a single nonlinear layer are surprisingly powerful with regards to representation, and show that in contrast to what is suggested in recent work, depth is not needed in order to represent highly smooth functions. 3. Gradient Descent on the recombination weights of a two-layer random features network with ReLUs and smoothed ReLUs can learn low degree polynomials up to squared error $epsilon$ with $mathrm{subpoly}(1/epsilon)$ units. Even though deep networks can approximate these polynomials with $mathrm{polylog}(1/epsilon)$ units, existing emph{learning} bounds for this problem require $mathrm{poly}(1/epsilon)$ units. To the best of our knowledge, our results give the first sub-polynomial learning guarantees for this problem.
机译:我们为神经网络逼近开发一个 emph {纠正机制}近似:总可用的非线性单位被分成多个组,并且第一组近似于所考虑的功能,第二个组近似于第一组产生的近似值的误差并校正它,第三组近似于第一组和第二组一起产生的误差等。该技术产生了多个新的表示和学习结果的神经网络:1。随机特征中的双层神经网络(RF)可以在$ MATHBB {R} ^ D $中以$ n $任意点记住任意标签 tilde {o}( tfrac {n} { theta ^ 4})$ relus,其中$ theta $是两个不同点之间的最小距离。可以显示此绑定以$ n $最佳到数因子。 2.具有释放和平滑释放的双层神经网络可以代表具有最多$ epsilon $的函数以$ o(c(a,d) epsilon ^ { - 1 /(a + 1)})$ $ a in mathbb {n} cup cup {0 } $当函数有$ theta(广告)$界衍生品时。在某些情况下,$ D $可以用有效的维度$ q ll d $替换。我们的结果表明,对于表示表示,只有单个非线性层的神经网络令人惊讶地强大强大,并且表明与最近的工作中的建议相比,不需要深度,以表示高度平滑的功能。 3.梯度下降在双层随机特征网络的重组权重,用释放和平滑的释放可以学习低度多项式,从$ mathrm {subpoly}(1 / epsilon)$单位。尽管深度网络可以用$ mathrm {polylog}(1 / epsilon)$单位近似这些多项式,但是此问题的现有 emph {学习}界限需要$ mathrm {poly}(1 / epsilon)$单位。据我们所知,我们的结果为此问题提供了第一次多项式学习保障。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号