首页> 外文学位 >Active learning with partially-labeled data to reduce classification loss.
【24h】

Active learning with partially-labeled data to reduce classification loss.

机译:主动学习带有部分标记的数据,以减少分类损失。

获取原文
获取原文并翻译 | 示例

摘要

In many occasions in real life, we are faced with the problem of classification of partially labeled data, or semi-supervised learning. We consider the special case of scarcely labeled data or when the labeled data is insufficient, and present a principled method which implements active learning in scarcely labeled data to enhance the performance of the learner. Our active learning algorithm is a co-training procedure that does not require natural view splits. Blum and Mitchell's co-training is a popular semi-supervised algorithm to use when we have multiple independent views of the entities to classify. An example of a multi-view situation is classifying web pages: one view may describe the pages by the words that occur on them, another view describes the pages by the words in the hyperlinks that point to them. So we start by analyzing co-training procedure and then introduce our general active learning algorithm.; This method is based on the recent bias variance decomposition work for a 0-1 loss function. We focus on bias and variance reduction to reduce 0-1 loss by first selecting a random pool from the unlabeled data, and then using the most-informative instances from that pool to reduce the variance, bias, and thereby overall loss of the learner via active learning.; Finally we focus on the best instance selection for labeling the unlabeled data; we use Jensen-Shannon divergence as one selection criterion. We show that our single instance selection approaches are superior to multiple instance selection approach. Our empirical results show that this technique can improve the generalization error with less running time compared with other active learning algorithms.
机译:在现实生活中的许多情况下,我们都面临着部分标记数据的分类或半监督学习的问题。我们考虑了标记很少的数据或标记不足的数据的特殊情况,并提出了一种原理性的方法,该方法对标记很少的数据实施主动学习以提高学习者的表现。我们的主动学习算法是一种共训练过程,不需要自然视图拆分。 Blum和Mitchell的协同训练是一种流行的半监督算法,当我们对实体有多个独立的视图进行分类时可以使用。多视图情况的一个示例是对网页进行分类:一个视图可以通过页面上出现的单词来描述页面,另一视图可以通过指向它们的超链接中的单词来描述页面。因此,我们首先分析协同训练过程,然后介绍我们的一般主动学习算法。该方法基于针对0-1损失函数的最近偏差方差分解工作。我们首先通过从未标记的数据中选择一个随机池,然后使用该池中信息量最大的实例来减少方差和偏差,从而减少0-1损失,以减少方差,偏差,从而通过主动学习。;最后,我们专注于标记未标记数据的最佳实例选择。我们使用詹森-香农散度作为一种选择标准。我们证明了单实例选择方法优于多实例选择方法。我们的实验结果表明,与其他主动学习算法相比,该技术可以以更少的运行时间来改善泛化误差。

著录项

  • 作者

    Aminian, Minoo.;

  • 作者单位

    State University of New York at Albany.;

  • 授予单位 State University of New York at Albany.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 70 p.
  • 总页数 70
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号