首页> 外文会议>Asian conference on computer vision >Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets
【24h】

Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets

机译:预算的半监督学习:扩展到大数据集

获取原文

摘要

Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.
机译:Internet数据源为我们提供了大型图像数据集,这些数据集大部分没有任何显式标记。此设置是半监督学习的理想选择,该学习旨在利用标记的数据以及大量未标记的数据点来改善学习和分类。尽管我们在理论和算法上取得了长足的进步,但是我们看到将这种进步转化为这些方法所启发的大规模数据集的成功有限。我们研究了基于流行的基于图的半监督学习算法以及不同的提速方法的计算复杂性。我们的发现导致了一种新算法,与以前的方法相比,该算法可将数据集扩展多达40倍,甚至可以提高分类性能。我们的方法基于以下关键见解:通过采用基于密度的度量,可以像主动学习方案一样选择未标记的数据点。这导致了紧凑的图形,从而以降低的计算成本将性能提高了11.6%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号