首页> 外文期刊>Pattern Analysis and Applications >Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification
【24h】

Exploiting the value of class labels on high-dimensional feature spaces: topic models for semi-supervised document classification

机译:利用高维特征空间上的类标签的值:半监督文件分类主题模型

获取原文
获取原文并翻译 | 示例

摘要

We propose a class-based mixture of topic models for classifying documents using both labeled and unlabeled examples (i.e., in a semi-supervised fashion). Most topic models incorporate documents' class labels by generating them after generating the words. In these models, the training class labels have small effect on the estimated topics, as they are effectively treated as just another word, amongst a huge set of word features. In this paper, we propose to increase the influence of class labels on topic models by generating the words in each document conditioned on the class label. We show that our specific generative process improves classification performance with small loss in test set log-likelihood. Within our framework, we provide a principled mechanism to control the contributions of the class labels and the word space to the likelihood function. Experiments show our approach achieves better classification accuracy compared to some standard semi-supervised and supervised topic models.
机译:我们提出了一种基于类的主题模型混合,用于使用标记和未标记的示例(即,以半监督方式)进行分类文档。大多数主题模型通过在生成单词后生成文档的类标签。在这些模型中,培训类标签对估计主题的影响很小,因为它们有效地将其视为另一个单词,其中包括一系列巨大的单词特征。在本文中,我们建议通过在类标签上的每个文档中生成单词来增加类标标签对主题模型的影响。我们表明,我们的特定生成过程提高了测试集日志可能性小的损失。在我们的框架内,我们提供了一个原则的机制来控制类标签和单词空间对似然函数的贡献。实验表明,与一些标准的半监督主题模型相比,我们的方法达到了更好的分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号