首页> 外文期刊>Neural computation >Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification
【24h】

Power Function Error Initialization Can Improve Convergence of Backpropagation Learning in Neural Networks for Classification

机译:功率函数错误初始化可以提高神经网络中的反向化学习的收敛性分类

获取原文
获取原文并翻译 | 示例

摘要

Supervised learning corresponds to minimizing a loss or cost function expressing the differences between model predictions yn and the target values tn given by the training data. In neural networks, this means backpropagating error signals through the transposed weight matrixes from the output layer toward the input layer. For this, error signals in the output layer are typically initialized by the difference y_n − t_n, which is optimal for several commonly used loss functions like cross-entropy or sum of squared errors. Here I evaluate a more general error initialization method using power functions |y_n − t_n|q for q > 0, corresponding to a new family of loss functions that generalize crossentropy. Surprisingly, experiments on various learning tasks reveal that a proper choice of q can significantly improve the speed and convergence of backpropagation learning, in particular in deep and recurrent neural networks. The results suggest two main reasons for the observed improvements. First, compared to cross-entropy, the new loss functions provide better fits to the distribution of error signals in the output layer and therefore maximize the model’s likelihood more efficiently. Second, the new error initialization procedure may often provide a better gradient-to-loss ratio over a broad range of neural output activity, thereby avoiding flat loss landscapes with vanishing gradients.
机译:监督学习对应于最小化表达模型预测YN与训练数据给出的目标值TN之间的差异的损失或成本函数。在神经网络中,这意味着通过从输出层朝向输入层的转换权重矩阵来反向衰减信号。为此,输出层中的误差信号通常由差异Y_N - T_N初始化,这对于几种常用的损耗函数(如跨熵或平方误差的总和)是最佳的。这里,我使用功率函数(y_n - t_n | q)为q> 0评估更常规的误差初始化方法,对应于概括基因分析的新丢失功能系列。令人惊讶的是,各种学习任务的实验表明,正确选择的Q可以显着提高反向衰减学习的速度和融合,特别是深度和经常性的神经网络。结果表明观察到的改进的两个主要原因。首先,与跨熵相比,新的损耗函数为输出层中的误差信号分布提供更好的拟合,因此更有效地提高了模型的似然。其次,新的误差初始化过程通常可以在广泛的神经输出活动中提供更好的梯度与损失比,从而避免使用消失梯度的平坦损失景观。

著录项

  • 来源
    《Neural computation》 |2021年第8期|2193-2225|共33页
  • 作者

    Andreas Knoblauch;

  • 作者单位

    Albstadt-Sigmaringen University Albstadt 72458 Germany;

  • 收录信息 美国《科学引文索引》(SCI);美国《化学文摘》(CA);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号