Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets

机译：预算的半监督学习：扩展到大数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Internet data sources provide us with large image datasets which are mostly without any explicit labeling. This setting is ideal for semi-supervised learning which seeks to exploit labeled data as well as a large pool of unlabeled data points to improve learning and classification. While we have made considerable progress on the theory and algorithms, we have seen limited success to translate such progress to the large scale datasets which these methods are inspired by. We investigate the computational complexity of popular graph-based semi-supervised learning algorithms together with different possible speed-ups. Our findings lead to a new algorithm that scales up to 40 times larger datasets in comparison to previous approaches and even increases the classification performance. Our method is based on the key insights that by employing a density-based measure unlabeled data points can be selected similar to an active learning scheme. This leads to a compact graph resulting in an improved performance up to 11.6% at reduced computational costs.

机译：Internet数据源为我们提供了大型图像数据集，这些数据集大部分没有任何显式标记。此设置是半监督学习的理想选择，该学习旨在利用标记的数据以及大量未标记的数据点来改善学习和分类。尽管我们在理论和算法上取得了长足的进步，但是我们看到将这种进步转化为这些方法所启发的大规模数据集的成功有限。我们研究了基于流行的基于图的半监督学习算法以及不同的提速方法的计算复杂性。我们的发现导致了一种新算法，与以前的方法相比，该算法可将数据集扩展多达40倍，甚至可以提高分类性能。我们的方法基于以下关键见解：通过采用基于密度的度量，可以像主动学习方案一样选择未标记的数据点。这导致了紧凑的图形，从而以降低的计算成本将性能提高了11.6％。

著录项

来源
《Asian conference on computer vision》|2013年|232-245|共14页
会议地点
作者
Sandra Ebert; Mario Fritz; Bernt Schiele;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Graph-based semi-supervised learning with multi-modality propagation for large-scale image datasets [J] . Wen-Yu Lee, Liang-Chi Hsieh, Guan-Long Wu, Journal of visual communication & image representation . 2013,第3期

机译：基于图的半监督学习与多模式传播的大规模图像数据集
2. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [J] . Ana Stanescu, Doina Caragea BMC Systems Biology . 2015,第SUPPLEMENTa5期

机译：基于集成的不平衡剪接位点数据集的半监督学习方法的实证研究
3. Biomarker discovery across annotated and unannotated microarray datasets using semi-supervised learning [J] . Cole Harris, Noushin Ghaffari BMC Genomics . 2008,第SUPPLEMENTa2期

机译：使用半监督学习跨注释的和未注释的微阵列数据集发现生物标志物
4. Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets [C] . Sandra Ebert, Mario Fritz, Bernt Schiele Asian Conference on Computer Vision . 2013

机译：预算中的半监督学习：扩展到大型数据集
5. Unsupervised Binary Code Learning for Approximate Nearest Neighbor Search in Large-scale Datasets. [D] . Zhang, Hao. 2016

机译：大规模数据集中近似邻居搜索的无监督二进制代码学习。
6. An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets [O] . Ana Stanescu, Doina Caragea 2015

机译：基于整体的不平衡拼接位点数据集半监督学习方法的实证研究
7. Semi-Supervised Learning on a Budget: Scaling up to Large Datasets [O] . Ra Ebert, Mario Fritz, Bernt Schiele 2013

机译：预算中的半监督学习：扩展到大型数据集

Semi-Supervised Learning on a Budget: Scaling Up to Large Datasets

摘要

著录项

相似文献

相关主题

期刊订阅