【24h】

New results in semi-supervised learning using adaptive classifier fusion

机译:使用自适应分类器融合的半监督学习的新结果

获取原文

摘要

In typical classification problems the data used to train a model for each class is often correctly labeled, and so that fully supervised learning can be utilized. For example, many illustrative labeled data sets can be found at sources such as the UCI Repository for Machine Learning, or at the Keel Data Set Repository. However, increasingly many real world classification problems involve data that contain both labeled and unlabeled samples. In the latter case, the data samples are assumed to be missing all class label information, and when used as training data these samples are considered to be of unknown origin (i.e., to the learning system, actual class membership is completely unknown). Typically, when presented with a classification problem containing both labeled and unlabeled training samples, a technique that is often used is to throw out the unlabeled data. In other words, the unlabeled data are not included with existing labeled data for learning, and which can result in a poorly trained classifier that does not reach its full performance potential. In most cases, the primary reason that unlabeled data are not often used for training is that, and depending on the classifier, the correct optimal model for semi-supervised classification (i.e., a classifier that learns class membership using both labeled and unlabeled samples) can be far too complicated to develop. In previous work, results were shown based on the fusion of binary classifiers to improve performance in multiclass classification problems. In this case, Bayesian methods were used to fuse binary classifier fusion outputs, while selecting the most relevant classifier pairs to improve the overall classifier decision space. Here, this work is extended by developing new algorithms for improving semi-supervised classification performance. Results are demonstrated with real data form the UCI and Keel Repositories.
机译:在典型的分类问题中,通常会正确标记用于训练每个班级模型的数据,以便可以利用完全监督的学习。例如,可以在诸如用于机器学习的UCI存储库或Keel数据集存储库之类的资源中找到许多说明性的标记数据集。但是,越来越多的现实世界中的分类问题涉及包含标记和未标记样本的数据。在后一种情况下,假定数据样本缺少所有类别标签信息,并且当用作训练数据时,这些样本被认为是来源不明(即,对于学习系统而言,实际的类别成员身份是完全未知的)。通常,当出现包含标签和未标签训练样本的分类问题时,经常使用的技术是丢弃未标签的数据。换句话说,未标记的数据不包含在现有的标记数据中以供学习,这可能导致训练有素的分类器无法充分发挥其潜能。在大多数情况下,未标记数据不经常用于训练的主要原因是,并且取决于分类器,半监督分类的正确最佳模型(即,分类器使用标记和未标记的样本来学习班级成员资格)可能太复杂而无法开发。在先前的工作中,基于二元分类器的融合来显示结果,以提高多类分类问题的性能。在这种情况下,使用贝叶斯方法融合二进制分类器融合输出,同时选择最相关的分类器对以改善整体分类器决策空间。在这里,通过开发用于改进半监督分类性能的新算法来扩展这项工作。结果通过UCI和Keel存储库中的真实数据进行了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号