A Maximum Entropy Framework for Semisupervised and Active Learning With Unknown and Label-Scarce Classes

Zhicong Qiu; David J. Miller; George Kesidis

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >A Maximum Entropy Framework for Semisupervised and Active Learning With Unknown and Label-Scarce Classes

【24h】

A Maximum Entropy Framework for Semisupervised and Active Learning With Unknown and Label-Scarce Classes

机译：具有未知和标签稀缺类的半监督和主动学习的最大熵框架

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We investigate semisupervised learning (SL) and pool-based active learning (AL) of a classifier for domains with label-scarce (LS) and unknown categories, i.e., defined categories for which there are initially no labeled examples. This scenario manifests, e.g., when a category is rare, or expensive to label. There are several learning issues when there are unknown categories: 1) it is a priori unknown which subset of (possibly many) measured features are needed to discriminate unknown from common classes and 2) label scarcity suggests that overtraining is a concern. Our classifier exploits the inductive bias that an unknown class consists of the subset of the unlabeled pool's samples that are atypical (relative to the common classes) with respect to certain key (albeit a priori unknown) features and feature interactions. Accordingly, we treat negative log-p-values on raw features as nonnegatively weighted derived feature inputs to our class posterior, with zero weights identifying irrelevant features. Through a hierarchical class posterior, our model accommodates multiple common classes, multiple LS classes, and unknown classes. For learning, we propose a novel semisupervised objective customized for the LS/unknown category scenarios. While several works minimize class decision uncertainty on unlabeled samples, we instead preserve this uncertainty [maximum entropy (maxEnt)] to avoid overtraining. Our experiments on a variety of UCI Machine learning (ML) domains show: 1) the use of p-value features coupled with weight constraints leads to sparse solutions and gives significant improvement over the use of raw features and 2) for LS SL and AL, unlabeled samples are helpful, and should be used to preserve decision uncertainty (maxEnt), rather than to minimize it, especially during the early stages of AL. Our AL system, leveraging a novel sample-selection scheme, discovers unknown classes and discriminates LS classes from common ones, with sparing use of oracle labeling.

机译：对于具有标签稀缺（LS）和未知类别（即最初没有标签示例的定义类别）的域，我们研究了分类器的半监督学习（SL）和基于池的主动学习（AL）。例如，这种情况表明类别很少或标记昂贵。当存在未知类别时，会遇到多个学习问题：1）先验未知是需要（可能有许多）测量特征的子集来将未知与普通类别区分开来； 2）标签稀缺性表明过度训练是一个问题。我们的分类器利用归纳偏差，即未知类由未标记池样本的子集组成，这些子集相对于某些关键特征（尽管是先验未知的）和特征相互作用是非典型的（相对于普通类）。因此，我们将原始特征上的负对数p值视为对我们类后验的非负加权派生特征输入，零权重标识了无关特征。通过分层的后验类，我们的模型可以容纳多个公共类，多个LS类和未知类。为了学习，我们提出了针对LS /未知类别场景定制的新型半监督目标。尽管有几篇著作将未标记样本的类别决策不确定性降至最低，但我们保留了这种不确定性[最大熵（maxEnt）]以避免过度训练。我们在各种UCI机器学习（ML）领域进行的实验表明：1）结合权重约束使用p值特征会导致解决方案稀疏，并且与原始特征的使用相比有显着改善； 2）LS SL和AL ，未标记的样本会有所帮助，并且应用于保留决策不确定性（maxEnt），而不是将其最小化，尤其是在AL的早期阶段。我们的AL系统利用一种新颖的样本选择方案，发现了未知的类，并且将LS类与普通的类区分开，并很少使用oracle标签。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2017年第4期|917-933|共17页
作者
Zhicong Qiu; David J. Miller; George Kesidis;
展开▼
作者单位

School of Electrical Engineering and Computer Science, College of Engineering, The Pennsylvania State University, University Park, PA, USA;

School of Electrical Engineering and Computer Science, College of Engineering, The Pennsylvania State University, University Park, PA, USA;

School of Electrical Engineering and Computer Science, College of Engineering, The Pennsylvania State University, University Park, PA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Labeling; Training; Entropy; Uncertainty; Adaptation models; Semisupervised learning;

机译：标记;训练;熵;不确定性;适应模型;半监督学习;

相似文献

外文文献
中文文献
专利

1. Combining Active and Semisupervised Learning of Remote Sensing Data Within a Renyi Entropy Regularization Framework [J] . Przemyslaw Polewski, Wei Yao, Marco Heurich, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2016,第7期

机译：在Renyi熵正则化框架内结合主动和半监督遥感数据的学习
2. Semisupervised Learning of Hyperspectral Data With Unknown Land-Cover Classes [J] . Jun G., Ghosh J. Geoscience and Remote Sensing, IEEE Transactions on . 2013,第1期

机译：具有未知土地覆盖类别的高光谱数据的半监督学习
3. Robust Graph-Based Semisupervised Learning for Noisy Labeled Data via Maximum Correntropy Criterion [J] . Du Bo, Tang Xinyao, Wang Zengmao, Cybernetics, IEEE Transactions on . 2019,第4期

机译：通过最大熵准则对噪声标签数据进行基于图的鲁棒半监督学习
4. Active Perception in Adversarial Scenarios using Maximum Entropy Deep Reinforcement Learning [C] . Macheng Shen, Jonathan P. How International Conference on Robotics and Automation . 2019

机译：使用最大熵深度强化学习的对抗场景中的主动感知
5. Semisupervised Active Learning and Group Anomaly Detection With Unknown or Label-Scarce Categories [D] . Qiu, Zhicong. 2017

机译：具有未知或标签稀缺类别的半监督主动学习和组异常检测
6. Cell Detection Using Extremal Regions in a Semisupervised Learning Framework [O] . Nisha Ramesh, Ting Liu, Tolga Tasdizen 2017

机译：在半监督学习框架中使用极端区域进行细胞检测
7. Semisupervised Learning of Hyperspectral Data With Unknown Land-Cover Classes [O] . Goo Jun, Joydeep Ghosh 2014

机译：半监督学习具有未知土地覆盖类的高光谱数据
8. Maximum Entropy Principle in the General Framework of the Band Method [R] . Gohberg, I., Kaashoek, M. A., Woerdeman, H. J. 1989

机译：带法一般框架中的最大熵原理

A Maximum Entropy Framework for Semisupervised and Active Learning With Unknown and Label-Scarce Classes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅