首页> 外文会议>IEEE International Symposium on Information Theory >Analytic Study of Double Descent in Binary Classification: The Impact of Loss
【24h】

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

机译:二元分类中双重下降的分析研究:损失的影响

获取原文

摘要

Extensive empirical evidence reveals that, for a wide range of different learning methods and data sets, the risk curve exhibits a double-descent (DD) trend as a function of the model size. In our recent coauthored paper [Deng et al., ’19], we proposed simple binary linear classification models and showed that the test error of gradient descent (GD) with logistic loss undergoes a DD. In this paper, we complement these results by extending them to GD with square loss. We show that the DD phenomenon persists, but we also identify several differences compared to logistic loss. This emphasizes that crucial features of DD curves (such as their transition threshold and global minima) depend both on the training data and on the learning algorithm. We further study the dependence of DD curves on the size of the training set. Similar to [Deng et al., ’19] our results are analytic: we plot the DD curves by first deriving sharp asymptotics for the test error under Gaussian features. Albeit simple, the models permit a principled study, the outcomes of which theoretically corroborate related empirical findings occurring in more complex learning tasks.
机译:大量的经验证据表明,对于各种不同的学习方法和数据集,风险曲线显示出作为模型大小的函数的双下降(DD)趋势。在我们最近的合着论文中[Deng等,'19],我们提出了简单的二进制线性分类模型,并显示了具有逻辑损失的梯度下降(GD)的测试误差经历了DD。在本文中,我们通过将它们扩展到平方损失为GD来对这些结果进行补充。我们表明DD现象仍然存在,但与逻辑损失相比,我们也发现了一些差异。这强调了DD曲线的关键特征(例如其过渡阈值和全局最小值)既取决于训练数据又取决于学习算法。我们进一步研究DD曲线对训练集大小的依赖性。与[Deng等,'19]相似,我们的结果也可以分析:我们通过首先推导高斯特征下的测试误差的渐近渐近性来绘制DD曲线。这些模型虽然简单,但允许进行有原则的研究,其结果在理论上证实了在更复杂的学习任务中发生的相关经验发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号