Combining Clustering and Co-training to Enhance Text Classification Using Unlabelled Data

机译：结合使用聚类和协同训练来增强使用未标记数据的文本分类

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we present a new co-training strategy that makes use of unlabelled data. It trains two predictors in parallel, with each predictor labelling the unlabelled data for training the other predictor in the next round. Both predictors are support vector machines, one trained using data from the original feature space, the other trained with new features that are derived by clustering both the labelled and unlabelled data. Hence, unlike standard co-training methods, our method does not require a priori the existence of two redundant views either of which can be used for classification, nor is it dependent on the availability of two different supervised learning algorithms that complement each other. We evaluated our method with two classifiers and three text benchmarks: WebKB, Reuters newswire articles and 20 NewsGroups. Our evaluation shows that our co-training technique improves text classification accuracy especially when the number of labelled examples are very few.

机译：在本文中，我们提出了一种新的协同训练策略，该策略利用了未标记的数据。它并行地训练两个预测变量，每个预测变量标记未标记的数据以在下一轮中训练另一个预测变量。两个预测器都是支持向量机，一个使用原始特征空间中的数据进行训练，另一个使用通过对标记和未标记数据进行聚类而得出的新特征进行训练。因此，与标准的协同训练方法不同，我们的方法不需要先验地存在两个冗余视图，它们都可以用于分类，也不依赖于两个互为补充的不同监督学习算法的可用性。我们用两个分类器和三个文本基准评估了我们的方法：WebKB，路透社新闻文章和20个新闻组。我们的评估表明，我们的共同训练技术可以提高文本分类的准确性，尤其是在带有标签的示例数量很少的情况下。

著录项

来源
《Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 23-26, 2002, Edmonton》|2002年|p.620-625|共6页
会议地点
作者
Bhavani Raskutti; Herman Ferra; Adam Kowalczyk;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Combining labelled and unlabelled data in the design of pattern classification systems [J] . Bogdan Gabrys, Lina Petrakieva International Journal of Approximate Reasoning . 2004,第3期

机译：在模式分类系统的设计中结合标记和未标记的数据
2. Stamantic clustering: Combining statistical and semantic features for clustering of large text datasets [J] . Mehta Vivek, Bawa Seema, Singh Jasmeet Expert systems with applications . 2021,第Jula期

机译：稳定性群集：组合统计和语义特征来群集大文本数据集
3. Re-discover Values of Data Using Data Jackets by Combining Cluster with Text Analysis [J] . Yanyuan Zeng, Yukio Ohsawa Procedia Computer Science . 2017,第1期

机译：通过将聚类与文本分析结合使用数据夹克来重新发现数据的价值
4. Combining clustering and co-training to enhance text classification using unlabelled data [C] . Bhavani Raskutti, Herman Ferra, Adam Kowalczyk Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining(KDD-2000) . 2002

机译：结合使用聚类和协同训练来增强使用未标记数据的文本分类
5. Combining text-, link-, and classification-based retrieval methods to enhance information discovery on the Web. [D] . Yang, Kiduk. 2002

机译：结合基于文本，链接和分类的检索方法，以增强Web上的信息发现能力。
6. Enhancing navigation in biomedical databases by community voting and database-driven text classification [O] . Timo Duchrow, Timur Shtatland, Daniel Guettler, 2009

机译：通过社区投票和数据库驱动的文本分类增强生物医学数据库中的导航
7. Combining Clustering and Co-training to Enhance Text Classification Using Unlabelled Data [O] . Bhavani Raskutti, Herman Ferra, Adam Kowalczyk 2002

机译：结合聚类和协同训练，使用未标记数据增强文本分类
8. Application of Cluster Analysis to Aerometric Data. Volume I. Part 1: Clustering, Validation, and Classification of Data. Part 2: Investigation and Report of Cluster Analysis [R] . Crutcher, H. L. , Nelson, C. , Fairbairn, B. , 1980

机译：聚类分析在航空数据中的应用。第一部分：数据的聚类，验证和分类。第2部分：聚类分析的调查和报告

Combining Clustering and Co-training to Enhance Text Classification Using Unlabelled Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅