首页> 外文会议>Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 23-26, 2002, Edmonton >Combining Clustering and Co-training to Enhance Text Classification Using Unlabelled Data
【24h】

Combining Clustering and Co-training to Enhance Text Classification Using Unlabelled Data

机译:结合使用聚类和协同训练来增强使用未标记数据的文本分类

获取原文
获取外文期刊封面目录资料

摘要

In this paper, we present a new co-training strategy that makes use of unlabelled data. It trains two predictors in parallel, with each predictor labelling the unlabelled data for training the other predictor in the next round. Both predictors are support vector machines, one trained using data from the original feature space, the other trained with new features that are derived by clustering both the labelled and unlabelled data. Hence, unlike standard co-training methods, our method does not require a priori the existence of two redundant views either of which can be used for classification, nor is it dependent on the availability of two different supervised learning algorithms that complement each other. We evaluated our method with two classifiers and three text benchmarks: WebKB, Reuters newswire articles and 20 NewsGroups. Our evaluation shows that our co-training technique improves text classification accuracy especially when the number of labelled examples are very few.
机译:在本文中,我们提出了一种新的协同训练策略,该策略利用了未标记的数据。它并行地训练两个预测变量,每个预测变量标记未标记的数据以在下一轮中训练另一个预测变量。两个预测器都是支持向量机,一个使用原始特征空间中的数据进行训练,另一个使用通过对标记和未标记数据进行聚类而得出的新特征进行训练。因此,与标准的协同训练方法不同,我们的方法不需要先验地存在两个冗​​余视图,它们都可以用于分类,也不依赖于两个互为补充的不同监督学习算法的可用性。我们用两个分类器和三个文本基准评估了我们的方法:WebKB,路透社新闻文章和20个新闻组。我们的评估表明,我们的共同训练技术可以提高文本分类的准确性,尤其是在带有标签的示例数量很少的情况下。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号