首页> 外文会议>International Conference on Development and Learning and Epigenetic Robotics >Training-ValueNet: Data Driven Label Noise Cleaning on Weakly-Supervised Web Images
【24h】

Training-ValueNet: Data Driven Label Noise Cleaning on Weakly-Supervised Web Images

机译:Training-ValueNet:在弱监督的Web图像上清除数据驱动的标签噪声

获取原文

摘要

Manually labelling new datasets for image classification remains expensive and time-consuming. A promising alternative is to utilize the abundance of images on the web for which search queries or surrounding text offers a natural source of weak supervision. Unfortunately the label noise in these datasets has limited their use in practice. Several methods have been proposed for performing unsupervised label noise cleaning, the majority of which use outlier detection to identify and remove mislabeled images. In this paper, we argue that outlier detection is an inherently unsuitable approach for this task due to major flaws in the assumptions it makes about the distribution of mislabeled images. We propose an alternative approach which makes no such assumptions. Rather than looking for outliers, we observe that mislabeled images can be identified by the detrimental impact they have on the performance of an image classifier. We introduce training-value as an objective measure of the contribution each training example makes to the validation loss. We then present the training-value approximation network (Training-ValueNet) which learns a mapping between each image and its training-value. We demonstrate that by simply discarding images with a negative training-value, Training-ValueNet is able to significantly improve classification performance on a held-out test set, outperforming the state of the art in outlier detection by a large margin.
机译:手动标记用于图像分类的新数据集仍然昂贵且费时。一个有前途的替代方法是利用网络上的大量图像,对于这些图像,搜索查询或周围的文本自然会带来薄弱的监管。不幸的是,这些数据集中的标签噪声限制了它们在实践中的使用。已经提出了几种执行无监督标签噪声清除的方法,其中大多数使用离群值检测来识别和去除错误标记的图像。在本文中,我们认为异常检测本质上是不合适的方法,因为假设错误标记图像分布的假设存在重大缺陷。我们提出了一种没有这种假设的替代方法。而不是寻找异常值,我们观察到标签错误的图像可以通过它们对图像分类器性能的有害影响来识别。我们介绍训练值,作为每个训练示例对验证损失做出贡献的客观度量。然后,我们介绍训练值近似网络(Training-ValueNet),该网络学习每个图像与其训练值之间的映射。我们证明,通过简单地丢弃具有负训练值的图像,Training-ValueNet可以显着提高保留测试集上的分类性能,从而大大超越了异常检测中的现有技术水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号