首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle
【24h】

Semisupervised Learning for a Hybrid Generative/Discriminative Classifier based on the Maximum Entropy Principle

机译:基于最大熵原理的混合生成分类器的半监督学习

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a method for designing semi-supervised classifiers trained on labeled and unlabeled samples. We focus on probabilistic semi-supervised classifier design for multi-class and singlelabeled classification problems, and propose a hybrid approach that takes advantage of generative and discriminative approaches. In our approach, we first consider a generative model trained by using labeled samples and introduce a bias correction model, where these models belong to the same model family, but have different parameters. Then, we construct a hybrid classifier by combining these models based on the maximum entropy principle. To enable us to apply our hybrid approach to text classification problems, we employed naive Bayes models as the generative and bias correction models. Our experimental results for four text data sets confirmed that the generalization ability of our hybrid classifier was much improved by using a large number of unlabeled samples for training when there were too few labeled samples to obtain good performance. We also confirmed that our hybrid approach significantly outperformed generative and discriminative approaches when the performance of the generative and discriminative approaches was comparable. Moreover, we examined the performance of our hybrid classifier when the labeled and unlabeled data distributions were different.
机译:本文提出了一种设计用于对标记和未标记样本进行训练的半监督分类器的方法。我们专注于针对多类和单标签分类问题的概率半监督分类器设计,并提出一种利用生成性和区分性方法的混合方法。在我们的方法中,我们首先考虑通过使用标记的样本训练的生成模型,并引入偏差校正模型,其中这些模型属于相同的模型族,但是具有不同的参数。然后,我们根据最大熵原理将这些模型进行组合,构造出一个混合分类器。为了使我们能够将混合方法应用于文本分类问题,我们使用朴素贝叶斯模型作为生成和偏差校正模型。我们针对四个文本数据集的实验结果证实,当标记的样本太少而无法获得良好的性能时,通过使用大量未标记的样本进行训练,可以大大提高混合分类器的泛化能力。我们还确认,当生成方法和判别方法的性能相当时,我们的混合方法明显优于生成方法和判别方法。此外,当标记和未标记的数据分布不同时,我们检查了混合分类器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号