首页> 外文会议>Insternational Joint Conference on Natural Language Processing >A Comparative Study on the Use of Labeled and Unlabeled Data for Large Margin Classifiers
【24h】

A Comparative Study on the Use of Labeled and Unlabeled Data for Large Margin Classifiers

机译:大型保证金分类器使用标记和未标记数据的比较研究

获取原文

摘要

We propose to use both labeled and unlabeled data with the Expectation-Maximization (EM) algorithm in order to estimate the generative model and use this model to construct a Fisher kernel. The Naive Bayes generative probability is used to model a document. Through the experiments of text categorization, we empirically show that, (a) the Fisher kernel with labeled and unlabeled data outperforms Naive Bayes classifiers with EM and other methods for a sufficient amount of labeled data, (b) the value of additional unlabeled data diminishes when the labeled data size is large enough for estimating a reliable model, (c) the use of categories as latent variables is effective, and (d) larger unlabeled training datasets yield better results.
机译:我们建议使用标记和未标记的数据与期望 - 最大化(EM)算法,以估计生成模型并使用此模型来构建Fisher内核。 Naive Bayes生成概率用于模拟文档。 通过对文本分类的实验,我们经验证明,(a)与标记和未标记的数据的Fisher内核优于Naive Bayes分类器,以足够量的标记数据,(b)附加未标记数据的价值减少 当标记的数据大小足够大时足以估计可靠的模型,(c)类别的使用作为潜在变量是有效的,并且(d)更大的未标记训练数据集产生更好的结果。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号