首页> 外文期刊>Knowledge-Based Systems >Balancing Exploration and Exploitation: A novel active learner for imbalanced data
【24h】

Balancing Exploration and Exploitation: A novel active learner for imbalanced data

机译:平衡勘探和剥削:一种新型活跃的学习者,用于不平衡数据

获取原文
获取原文并翻译 | 示例
           

摘要

Active learning receives great interest from researchers with the aim of reducing the amount of time, cost, and efforts for labeling data in many applications. Active learning aims to generate/select the smallest possible amount of training data that ensures strong classification performance in the test phase. An active learner carries out two main steps: (i) selecting a set of promising queries from unlabeled data, and (ii) annotating the selected queries. Most active learners choose either the most informative or representative instances for annotation. In this paper, we combined these two criteria for query selection. First, in the exploration phase, the proposed algorithm explores the search space and tries in each iteration to visit new regions for better exploration. This improves the capability of exploring the space of minority classes with imbalanced data. Second, in the exploitation phase, the goal is to generate a new point in an uncertain region, which is expected to be around the decision boundaries of the target functions. Some variants of the proposed algorithm do not require any labeled or unlabeled data in advance. There is only comparably few existing work which addresses this scenario. Experiments on synthetic and real datasets with different dimensions and imbalance ratios indicate that the proposed algorithm has significant advantages compared to various well-known active learners. (C) 2020 Elsevier B.V. All rights reserved.
机译:主动学习从研究人员那里获得巨大的兴趣,目的是降低在许多应用中标记数据的时间,成本和努力。主动学习旨在生成/选择最小的培训数据,确保在测试阶段中的强分类性能。一个有效的学习者执行两个主要步骤:(i)从未标记的数据中选择一组有希望的查询,以及(ii)注释所选查询。大多数活跃的学习者选择最富有信息的或代表性的注释实例。在本文中,我们将这两个标准组合了解查询选择。首先,在探索阶段,所提出的算法探讨了搜索空间,并试图在每次迭代中访问新区域以获得更好的探索。这提高了探索少数群体类别的能力与不平衡数据。其次,在开发阶段,目标是在不确定区域生成新的点,这预计将围绕目标函数的决策边界。所提出的算法的一些变体不需要提前要求任何标记或未标记的数据。只有很少的工作工作,解决了这种情况。具有不同尺寸和不平衡比率的合成和实时数据集的实验表明,与各种着名的活跃学习者相比,所提出的算法具有显着的优势。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号