【24h】

Robust Active Label Correction

机译:强大的主动标签校正

获取原文
           

摘要

Active label correction addresses the problem of learning from input data for which noisy labels are available (e.g., from imprecise measurements or crowd-sourcing) and each true label can be obtained at a significant cost (e.g., through additional measurements or human experts). To minimize these costs, we are interested in identifying training patterns for which knowing the true labels maximally improves the learning performance. We approximate the true label noise by a model that learns the aspects of the noise that are class-conditional (i.e., independent of the input given the observed label). To select labels for correction, we adopt the active learning strategy of maximizing the expected model change. We consider the change in regularized empirical risk functionals that use different pointwise loss functions for patterns with noisy and true labels, respectively. Different loss functions for the noisy data lead to different active label correction algorithms. If loss functions consider the label noise rates, these rates are estimated during learning, where importance weighting compensates for the sampling bias. We show empirically that viewing the true label as a latent variable and computing the maximum likelihood estimate of the model parameters performs well across all considered problems. A maximum a posteriori estimate of the model parameters was beneficial in most test cases. An image classification experiment using convolutional neural networks demonstrates that the class-conditional noise model, which can be learned efficiently, can guide re-labeling in real-world applications.
机译:主动标签校正解决了从有噪声标签可用的输入数据中学习的问题(例如,从不精确的测量或众包中获取),并且每个真实标签的成本都很高(例如,通过额外的测量或人类专家)。为了最大程度地降低这些成本,我们对确定训练模式感兴趣,对于这些训练模式,了解真实标签可以最大程度地提高学习效果。我们通过一个模型来近似真实的标签噪声,该模型学习了分类条件的噪声方面(即,与给定观察到的标签的输入无关)。为了选择要校正的标签,我们采用了主动学习策略,以最大化预期的模型变化。我们考虑正则化经验风险函数的变化,这些函数分别对带有噪声和真实标签的模式使用不同的逐点损失函数。嘈杂数据的不同损失函数导致不同的有源标签校正算法。如果损失函数考虑标签噪声率,则在学习过程中估计这些率,其中重要权重可补偿采样偏差。我们从经验上证明,将真实标签视为潜在变量并计算模型参数的最大似然估计在所有考虑的问题上都表现良好。在大多数测试案例中,模型参数的最大后验估计是有益的。使用卷积神经网络的图像分类实验表明,可以有效学习的类条件噪声模型可以指导实际应用中的重新标记。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号