首页> 外文会议>ACM international conference on information and knowledge management >Semi-Supervised SVMs for Classification with Unknown Class Proportions and a Small Labeled Dataset
【24h】

Semi-Supervised SVMs for Classification with Unknown Class Proportions and a Small Labeled Dataset

机译:半监控SVMS,用于分类,具有未知类比例和一个小标记的数据集

获取原文

摘要

In the design of practical web page classification systems one often encounters a situation in which the labeled training set is created by choosing some examples from each class; but, the class proportions in this set are not the same as those in the test distribution to which the classifier will be actually applied. The problem is made worse when the amount of training data is also small. In this paper we explore and adapt binary SVM methods that make use of un-labeled data from the test distribution, viz.. 'IVansductive SVMs (TSVMs) and expectation regularization/constraint (ER/EC) methods to deal with this situation. We empirically show that when the labeled training data is small, TSVM designed using the class ratio tuned by minimizing the loss on the labeled set yields the best performance; its performance is good even when the deviation between the class ratios of the labeled training set and the test set is quite large. When the labeled training data is sufficiently large, an unsupervised Gaussian mixture model can be used to get a very good estimate of the class ratio in the test set; also, when this estimate is used, both TSVM and EC/ER give their best possible performance, with TSVM coming out superior. The ideas in the paper can be easily extended to multi-class SVMs and MaxEnt models.
机译:在实用的网页分类系统的设计中,一个经常遇到通过从每个类中选择一些示例来创建标记训练集的情况;但是,该集中的班级比例与分类器将实际应用的测试分发中的比例不同。当训练数据的数量也很小时,问题是更糟糕的。在本文中,我们探索和调整二进制SVM方法,这些方法从测试分发中使用未标记的数据,VIZ。'ivansductive SVM(TSVMS)和期望正则化/约束(ER / EC)方法来处理这种情况。我们经常表明,当标记的训练数据很小时,使用通过最小化标记集上的损耗来调整的类比赛的TSVM产生的最佳性能;即使标记为训练集的级别比率和测试集相当大,其性能也很好。当标记的训练数据足够大时,无监督的高斯混合模型可用于获得测试集中的类比的非常好的估计;此外,当使用该估计时,TSVM和EC / ER都提供了最佳的性能,TSVM出来优越。纸张中的想法可以很容易地扩展到多级SVM和MAXENT模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号