首页> 外文会议>Brazilian Conference on Intelligent Systems >SSL-C4.5: Implementation of a Classification Algorithm for Semi-supervised Learning Based on C4.5
【24h】

SSL-C4.5: Implementation of a Classification Algorithm for Semi-supervised Learning Based on C4.5

机译:SSL-C4.5:基于C4.5的半监督学习分类算法的实现

获取原文

摘要

Classification algorithms have been extensively studied in many of the major scientific investigations in recent decades. Many of these algorithms are designed for supervised learning, which requires labeled instances to achieve effective learning models. However, in many of the real human processes, data labeling is expensive and time-consuming. Because of this, alternative learning paradigms have been proposed to reduce the cost of the labeling process without a significant loss of model performance. This paper presents the Semi-Supervised Learning C4.5 algorithm (SSL-C4.5) designed to work in scenarios where only a small part of the data is labeled. SSL-C4.5 was implemented over the J48 implementation of the C4.5 algorithm available at the WEKA platform. The ,148 was modified incorporating a metric for semi-supervised learning. This metric aims at inducing decision tree models able to analyze and extract information from the entire training dataset, including instances of unlabeled data in scenarios where they are the majority. The assessment performed using eight different benchmark datasets showed that the new proposal has achieved promising results compared to the supervised version of C4.5.
机译:在近几十年来的许多主要科学调查中,分类算法已被广泛研究。许多这些算法都是为监督学习而设计的,这需要标记的实例来实现有效的学习模型。然而,在许多真实的人类过程中,数据标签昂贵且耗时。因此,已经提出了替代学习范式来降低标签过程的成本,而不会显着损失模型性能。本文介绍了半监督学习C4.5算法(SSL-C4.5),旨在在场景中工作,其中仅标记了一小部分数据。 SSL-C4.5在Weka平台上可用的C4.5算法的J48实施方面实施。在半监督学习的修改中,修改了148个。该度量旨在诱导能够分析和提取来自整个训练数据集的信息的决策树模型,包括在占多数的情景中的未标记数据的实例。使用八个不同的基准数据集进行的评估表明,与C4.5的监督版本相比,新提案已经取得了有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号