首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Tri-training: exploiting unlabeled data using three classifiers
【24h】

Tri-training: exploiting unlabeled data using three classifiers

机译:三级训练:使用三个分类器来利用未标记的数据

获取原文
获取原文并翻译 | 示例

摘要

In many practical data mining applications, such as Web page classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi-supervised learning algorithms such as co-training have attracted much attention. In this paper, a new co-training style semi-supervised learning algorithm, named tri-training, is proposed. This algorithm generates three classifiers from the original labeled example set. These classifiers are then refined using unlabeled examples in the tri-training process. In detail, in each round of tri-training, an unlabeled example is labeled for a classifier if the other two classifiers agree on the labeling, under certain conditions. Since tri-training neither requires the instance space to be described with sufficient and redundant views nor does it put any constraints on the supervised learning algorithm, its applicability is broader than that of previous co-training style algorithms. Experiments on UCI data sets and application to the Web page classification task indicate that tri-training can effectively exploit unlabeled data to enhance the learning performance.
机译:在许多实际的数据挖掘应用程序中,例如网页分类,可以轻松获得未标记的培训示例,但获得标记的示例相当昂贵。因此,诸如协同训练之类的半监督学习算法引起了广泛的关注。本文提出了一种新的协同训练风格的半监督学习算法,称为三训练。该算法从原始标记的示例集中生成三个分类器。然后,在三级训练过程中使用未标记的示例对这些分类器进行细化。详细地,在每轮三重训练中,如果在某些条件下其他两个分类器在标签上达成一致,则为分类器标记一个未标记的示例。由于三重训练既不需要使用足够多的冗余视图来描述实例空间,也不会对监督学习算法施加任何约束,因此它的适用性比以前的共同训练风格算法更广泛。对UCI数据集及其在网页分类任务中的应用的实验表明,三级训练可以有效利用未标记的数据来提高学习性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号