We consider semi-supervised classification when part of the available datais unlabeled. These unlabeled data can be useful for the classificationproblem when we make an assumption relating the behavior of the regressionfunction to that of the marginal distribution. Seeger (2000) proposed thewell-known cluster assumption as a reasonable one. We propose amathematical formulation of this assumption and a method based ondensity level sets estimation that takes advantage of it to achieve fast ratesof convergence both in the number of unlabeled examples and the number oflabeled examples. color="gray">
展开▼
机译:当部分可用数据未标记时,我们考虑半监督分类。当我们假设回归函数的行为与边际分布的行为相关时,这些未标记的数据可能对分类问题有用。 Seeger(2000)提出了众所周知的集群假设 i>作为合理的假设。我们提出了这种假设的数学公式,并提出了一种基于密度水平集估计的方法,利用该方法可以在未标记的实例数和标记的实例数上实现快速的收敛速度。 color =“ gray”>
展开▼