Adaptive classifier design using labelled and unlabelled data.

机译：使用标记和未标记数据的自适应分类器设计。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In statistical pattern recognition the goal is to accurately classify samples based on a set of descriptive predictor variables (features). The design of pattern recognition systems involves three major tasks: acquisition of training data; data pre-processing; and learning a classification function that accurately predicts the class label of new samples. This thesis develops statistical algorithms for all three tasks as explained below.; First, consider pre-processing: in this stage of classifier design, the objective is to represent the data in a form that allows the classification function to be learnt accurately in the subsequent stage. Since many of the initially postulated predictor variables may be irrelevant, in order to be able to learn an accurate classification from limited training data, it is important to identify the predictor variables which carry information pertinent to the classification. Often referred to as feature selection, previous work has largely addressed this task in isolation from that of classifier design.; The first part of this thesis develops algorithms that combine these two tasks and solve them jointly. Removing the artificial distinction between these tasks, the algorithms presented here identify only those features that are most useful in performing the classification itself. For the special case of binary linear classifiers, non-trivial theoretical error bounds are derived for our algorithm. These bounds are significantly tighter than previous results such as those derived from Vapnik-Chervonenkis theory.; The second part of this thesis develops a new semi-supervised algorithm for learning the classification function. Conventional supervised classifier learning methods assume access to training data associated with known class labels. However, in many problems data is initially gathered unlabelled and the subsequent acquisition of labelled information is costly and/or time consuming. Given access to a small number of labelled samples and an abundance of unlabelled data, semi-supervised algorithms that learn from both have been the focus of much research in the last few years.; The last part of the thesis develops methods for adaptive collection of training data in a way that focuses on learning the classification function accurately, under the constraints of a limited data-acquisition budget. As compared to passively learning from examples provided by a teacher, a student can learn a concept much faster by actively asking questions to the teacher to clarify doubts. Based on this idea, active data query selection is performed using a mutual information based criterion that explicitly uses the labelled and unlabelled data as well as the co-training information to decide what additional data will be most useful for learning the classifier. (Abstract shortened by UMI.)

机译：在统计模式识别中，目标是基于一组描述性预测变量（特征）对样本进行准确分类。模式识别系统的设计涉及三个主要任务：训练数据的获取；数据预处理；学习分类功能，以准确预测新样品的类别标签。本文为以下三个任务开发了统计算法。首先，考虑预处理：在分类器设计的此阶段，目标是以能够在后续阶段中准确学习分类函数的形式表示数据。由于许多最初假定的预测变量可能是不相关的，因此为了能够从有限的训练数据中学习准确的分类，重要的是要识别出携带与分类有关的信息的预测变量。通常被称为特征选择，以前的工作在很大程度上与分类器设计无关地解决了该任务。本文的第一部分开发了将这两个任务结合起来并共同解决的算法。消除了这些任务之间的人为区别，此处介绍的算法仅识别那些对执行分类本身最有用的功能。对于二进制线性分类器的特殊情况，为我们的算法推导了非平凡的理论误差范围。这些界限比以前的结果（例如从Vapnik-Chervonenkis理论得出的结果）要严格得多。本文的第二部分开发了一种新的用于学习分类函数的半监督算法。常规的监督分类器学习方法假定访问与已知类别标签相关的训练数据。但是，在许多问题中，最初收集的数据是未标记的，随后获取标记的信息的成本高昂和/或费时。考虑到可以访问少量标记的样本和大量未标记的数据，从这两种方法中学到的半监督算法已成为最近几年研究的重点。本文的最后一部分提出了一种在有限的数据采集预算约束下，以准确学习分类函数为重点的方法来自适应地收集训练数据的方法。与从老师提供的示例中被动学习相比，通过主动向老师提问以澄清疑问，学生可以更快地学习概念。基于此思想，使用基于互信息的标准执行主动数据查询选择，该标准明确使用标记的和未标记的数据以及协同训练信息来确定哪些附加数据对于学习分类器将最有用。（摘要由UMI缩短。）

著录项

作者
Krishnapuram, Balaji.;
展开▼
作者单位

Duke University.;

展开▼
授予单位 Duke University.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2004
页码 125 p.
总页数 125
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Classifier chains for positive unlabelled multi-label learning [J] . Teisseyre Pawel Knowledge-Based Systems . 2021,第Feba15期

机译：积极未标签的多标签学习的分类器链
2. Real-time adaptive sequential design for optimal acquisition of arterial spin labeling MRI data. [J] . Xie J, Clare S, Gallichan D, Magnetic resonance in medicine: official journal of the Society of Magnetic Resonance in Medicine . 2010,第1期

机译：实时自适应顺序设计，可最佳采集动脉旋转标记MRI数据。
3. Combining labelled and unlabelled data in the design of pattern classification systems [J] . Bogdan Gabrys, Lina Petrakieva International Journal of Approximate Reasoning . 2004,第3期

机译：在模式分类系统的设计中结合标记和未标记的数据
4. Adaptive Label Smoothing for Classifier-based Mutual Information Neural Estimation [C] . Xu Wang, Ali Al-Bashabsheh, Chao Zhao, IEEE International Symposium on Information Theory . 2021

机译：基于分类的共同信息神经估计的自适应标记平滑
5. Adaptive information filtering with labelled and unlabelled data. [D] . Stinson, Catherine Elizabeth. 2002

机译：带有标签和未标签数据的自适应信息过滤。
6. Rationale and study design of the Adaptive study of IL-2 dose on regulatory T cells in type 1 diabetes (DILT1D): a non-randomised open label adaptive dose finding trial [O] . Frank Waldron-Lynch, Paula Kareclas, Kathryn Irons, 2014

机译：IL-2剂量对1型糖尿病调节性T细胞（DILT1D）的适应性研究的理论基础和研究设计：一项非随机开放标签适应性剂量寻找试验
7. Real-time adaptive sequential design for optimal acquisition of arterial spin labeling MRI data. [O] . Xie, J, Clare, S, Gallichan, D, 2010

机译：实时自适应顺序设计，可最佳采集动脉旋转标记MRI数据。
8. Optimal Design of an Unsupervised Adaptive Classifier with Unknown Priors [R] . Demetrios Kazakoa 1974

机译：具有未知prior的无监督自适应分类器的优化设计

Adaptive classifier design using labelled and unlabelled data.

摘要

著录项

相似文献

相关主题

期刊订阅