Usually, we can use a classification or clustering machine learning algorithm to manage knowledge and information retrieval. If we have a small size of known information with a large scale of unknown data, a semi-supervised learning (SSL) algorithm is often preferred. Under the cluster or manifold assumption, usually, the larger amount of unlabeled data are used for learning, the bigger gains of the SSL approaches are achieved. In the paper, we adopt the graph-based SSL algorithm to solve the problem. However, the graph-based SSL algorithms are unable to be learnt with large-scale unlabeled samples and originally can only work in a trans-ductive setting. In the paper, we propose a scalable graph-based SSL algorithm to attack the problems aforementioned by Gaussian mixture model label propagation. Experiments conducted on the real dataset illustrate the effectiveness of the proposed algorithm.
展开▼