D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions

Nuno Filipe Escudeiro; Alípio Mário Jorge

首页> 外文期刊>Brazilian Computer Society. Journal >D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions

【24h】

D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions

机译：D-Confidence：一种主动的学习策略，可在不平衡的班级分布情况下降低标签披露的复杂性

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In some classification tasks, such as those related to the automatic building and maintenance of text corpora, it is expensive to obtain labeled instances to train a classifier. In such circumstances it is common to have massive corpora where a few instances are labeled (typically a minority) while others are not. Semi-supervised learning techniques try to leverage the intrinsic information in unlabeled instances to improve classification models. However, these techniques assume that the labeled instances cover all the classes to learn which might not be the case. Moreover, when in the presence of an imbalanced class distribution, getting labeled instances from minority classes might be very costly, requiring extensive labeling, if queries are randomly selected. Active learning allows asking an oracle to label new instances, which are selected by criteria, aiming to reduce the labeling effort. D-Confidence is an active learning approach that is effective when in presence of imbalanced training sets. In this paper we evaluate the performance of d-Confidence in comparison to its baseline criteria over tabular and text datasets. We provide empirical evidence that d-Confidence reduces label disclosure complexity—which we have defined as the number of queries required to identify instances from all classes to learn—when in the presence of imbalanced data.

机译：在某些分类任务中，例如与文本语料库的自动构建和维护有关的那些任务中，获得标记实例来训练分类器非常昂贵。在这种情况下，通常会出现大量语料库，其中一些实例被标记（通常是少数），而另一些则没有。半监督学习技术尝试在未标记的实例中利用固有信息来改进分类模型。但是，这些技术假定带标签的实例涵盖了所有要学习的类，而事实并非如此。而且，当类分布不平衡时，如果随机选择查询，那么从少数类中获取带标签的实例可能会非常昂贵，需要大量的标签。主动学习允许要求oracle标记由条件选择的新实例，以减少标记工作量。 D-Confidence是一种主动的学习方法，当存在不平衡的训练集时有效。在本文中，我们比较了d-Confidence与表格和文本数据集的基线标准相比的性能。我们提供的经验证据表明，当存在不平衡数据时，d-Confidence降低了标签公开的复杂性（我们已将其定义为从所有要学习的类中识别实例所需的查询数量）。

著录项

来源
《Brazilian Computer Society. Journal》 |2012年第4期|共20页
作者
Nuno Filipe Escudeiro; Alípio Mário Jorge;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类农业科学;
关键词
入库时间 2022-08-18 05:55:23

相似文献

外文文献
中文文献
专利

1. D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions [J] . Nuno Filipe Escudeiro, Alípio Mário Jorge Journal of the Brazilian Computer Society . 2012,第4期

机译：D-Confidence：一种主动的学习策略，可在不平衡的班级分布情况下降低标签披露的复杂性
2. The effect of class imbalance, complexity, size, and learning distribution on classifier performance [J] . Sofia Visa International journal of advanced intelligence paradigms . 2011,第3a4期

机译：类不平衡，复杂性，大小和学习分布对分类器性能的影响
3. Active learning with extreme learning machine for online imbalanced multiclass classification [J] . Qin Jiongming, Wang Cong, Zou Qinhong, Knowledge-Based Systems . 2021,第Nova14期

机译：积极学习与在线的极端学习机器，用于在线的Mubalanced Multiclass分类
4. Why Label when you can Search? Alternatives to Active Learning for Applying Human Resources to Build Classification Models Under Extreme Class Imbalance [C] . Josh Attenberg, Foster Provost ACM SIGKDD international conference on knowledge discovery and data mining;KDD 10 . 2011

机译：为什么在可以搜索时加上标签？在极端阶级失衡下运用人力资源建立分类模型的主动学习方法
5. Active learning with partially-labeled data to reduce classification loss. [D] . Aminian, Minoo. 2006

机译：主动学习带有部分标记的数据，以减少分类损失。
6. Inter-Labeler and Intra-Labeler Variability of Condition Severity Classification Models Using Active and Passive Learning Methods [O] . Nir Nissim, Yuval Shahar, Mary Regina Boland, -1

机译：使用主动和被动学习方法的条件严重性分类模型的标签间和标签内变异性
7. D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions [O] . Nuno Filipe Escudeiro, Alípio Mário Jorge 2012

机译：D-Confidence：一种主动的学习策略，可在不平衡的班级分布情况下降低标签披露的复杂性

D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions

摘要

著录项

相似文献

相关主题

期刊订阅