首页> 外文期刊>Neurocomputing >Certainty-based active learning for sampling imbalanced datasets
【24h】

Certainty-based active learning for sampling imbalanced datasets

机译:基于确定性的主动学习,对不平衡数据集进行采样

获取原文
获取原文并翻译 | 示例

摘要

Active learning is to learn an accurate classifier within as few queried labels as possible. For practical applications, we propose a Certainty-Based Active Learning (CBAL) algorithm to solve the imbalanced data classification problem in active learning. Without being affected by irrelevant samples which might overwhelm the minority class, the importance of each unlabeled sample is carefully measured within an explored neighborhood. For handling the agnostic case, IWAL-ERM is integrated into our approach without costs. Thus our CBAL is designed to determine the query probability within an explored neighborhood for each unlabeled sample. The potential neighborhood is incrementally explored, and there is no need to define the neighborhood size in advance. In our theoretical analysis, it is presented that CBAL has a polynomial label query improvement over passive learning. And the experimental results on synthetic and real-world datasets show that, CBAL has the ability of identifying informative samples and dealing with the imbalanced data classification problem in active learning.
机译:主动学习是在尽可能少的查询标签内学习准确的分类器。对于实际应用,我们提出了一种基于确定性的主动学习(CBAL)算法,以解决主动学习中不平衡的数据分类问题。在不受无关样本影响的情况下,不相关样本可能会压倒少数群体,每个未标记样本的重要性都在经过研究的邻里内进行了仔细测量。为了处理不可知的情况,将IWAL-ERM集成到我们的方法中,无需花费任何费用。因此,我们的CBAL旨在确定每个未标记样本在探索邻域内的查询概率。逐步探索潜在的邻域,无需事先定义邻域大小。在我们的理论分析中,提出了CBAL相对于被动学习具有多项式标签查询改进。在综合和真实数据集上的实验结果表明,CBAL具有识别信息样本和处理主动学习中不平衡数据分类问题的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号