首页> 外文学位 >Adaptive classifier design using labelled and unlabelled data.
【24h】

Adaptive classifier design using labelled and unlabelled data.

机译:使用标记和未标记数据的自适应分类器设计。

获取原文
获取原文并翻译 | 示例

摘要

In statistical pattern recognition the goal is to accurately classify samples based on a set of descriptive predictor variables (features). The design of pattern recognition systems involves three major tasks: acquisition of training data; data pre-processing; and learning a classification function that accurately predicts the class label of new samples. This thesis develops statistical algorithms for all three tasks as explained below.; First, consider pre-processing: in this stage of classifier design, the objective is to represent the data in a form that allows the classification function to be learnt accurately in the subsequent stage. Since many of the initially postulated predictor variables may be irrelevant, in order to be able to learn an accurate classification from limited training data, it is important to identify the predictor variables which carry information pertinent to the classification. Often referred to as feature selection, previous work has largely addressed this task in isolation from that of classifier design.; The first part of this thesis develops algorithms that combine these two tasks and solve them jointly. Removing the artificial distinction between these tasks, the algorithms presented here identify only those features that are most useful in performing the classification itself. For the special case of binary linear classifiers, non-trivial theoretical error bounds are derived for our algorithm. These bounds are significantly tighter than previous results such as those derived from Vapnik-Chervonenkis theory.; The second part of this thesis develops a new semi-supervised algorithm for learning the classification function. Conventional supervised classifier learning methods assume access to training data associated with known class labels. However, in many problems data is initially gathered unlabelled and the subsequent acquisition of labelled information is costly and/or time consuming. Given access to a small number of labelled samples and an abundance of unlabelled data, semi-supervised algorithms that learn from both have been the focus of much research in the last few years.; The last part of the thesis develops methods for adaptive collection of training data in a way that focuses on learning the classification function accurately, under the constraints of a limited data-acquisition budget. As compared to passively learning from examples provided by a teacher, a student can learn a concept much faster by actively asking questions to the teacher to clarify doubts. Based on this idea, active data query selection is performed using a mutual information based criterion that explicitly uses the labelled and unlabelled data as well as the co-training information to decide what additional data will be most useful for learning the classifier. (Abstract shortened by UMI.)
机译:在统计模式识别中,目标是基于一组描述性预测变量(特征)对样本进行准确分类。模式识别系统的设计涉及三个主要任务:训练数据的获取;数据预处理;学习分类功能,以准确预测新样品的类别标签。本文为以下三个任务开发了统计算法。首先,考虑预处理:在分类器设计的此阶段,目标是以能够在后续阶段中准确学习分类函数的形式表示数据。由于许多最初假定的预测变量可能是不相关的,因此为了能够从有限的训练数据中学习准确的分类,重要的是要识别出携带与分类有关的信息的预测变量。通常被称为特征选择,以前的工作在很大程度上与分类器设计无关地解决了该任务。本文的第一部分开发了将这两个任务结合起来并共同解决的算法。消除了这些任务之间的人为区别,此处介绍的算法仅识别那些对执行分类本身最有用的功能。对于二进制线性分类器的特殊情况,为我们的算法推导了非平凡的理论误差范围。这些界限比以前的结果(例如从Vapnik-Chervonenkis理论得出的结果)要严格得多。本文的第二部分开发了一种新的用于学习分类函数的半监督算法。常规的监督分类器学习方法假定访问与已知类别标签相关的训练数据。但是,在许多问题中,最初收集的数据是未标记的,随后获取标记的信息的成本高昂和/或费时。考虑到可以访问少量标记的样本和大量未标记的数据,从这两种方法中学到的半监督算法已成为最近几年研究的重点。本文的最后一部分提出了一种在有限的数据采集预算约束下,以准确学习分类函数为重点的方法来自适应地收集训练数据的方法。与从老师提供的示例中被动学习相比,通过主动向老师提问以澄清疑问,学生可以更快地学习概念。基于此思想,使用基于互信息的标准执行主动数据查询选择,该标准明确使用标记的和未标记的数据以及协同训练信息来确定哪些附加数据对于学习分类器将最有用。 (摘要由UMI缩短。)

著录项

  • 作者

    Krishnapuram, Balaji.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 125 p.
  • 总页数 125
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号