首页> 外文会议>International Conference on Computer Engineering, Information Science Application Technology >A Novel Sampling Strategy for Active Learning over Evolving Stream Data
【24h】

A Novel Sampling Strategy for Active Learning over Evolving Stream Data

机译:一种新的采样策略,用于在不断发展的流数据上学习

获取原文

摘要

In classification tasks, data labeling is an expensive and time-consuming process, hence, active learning which query labels for a small representative portion of data, is becoming increasingly important. However, few works consider the challenges from data steam setting because most of the active learning method is designed for non-streaming setting. Be based upon the status quo, after synthesizing the evidence-based uncertainty sampling strategy and split sampling strategy above, we propose a new sampling strategy for active learning over evolving stream data, which can take full advantages of the strengths of each. First, the original data stream is randomly divided into two sub-streams. Instances from one sub-stream are labeled according to the high evidence-focused uncertainty strategy, while instances from the other sub-stream are marked by the random strategy for detecting true concept drifts. Second, we introduce a sliding window in the high evidence-focused uncertainty strategy, finding out whether an instance is the conflict-uncertainty instance or not. Clearly, our strategy solves the issue of the effective use of evidence in data streams setting, and can choose more representative instances over evolving data streams for training a model. Finally, in experiments over four benchmark datasets, compared with state-of-art active learning strategies, the result illustrates good predictive performance of our proposed approach.
机译:在分类任务,数据标签是昂贵且耗时的过程,因此,主动学习这对于数据的一小部分代表查询标签,正变得越来越重要。然而,作品很少考虑数据蒸汽定型的挑战,因为大多数的主动学习方法的设计用于非流设置。以其为依据的现状,综合证据为基础的不确定性采样策略和分采样上述策略后,我们提出了主动学习新的取样战略演变以上数据流,可以采取各方面的优势充分的优势。首先,原始数据流被随机分为两个子流。从一个子流实例根据高证据为中心的不确定性策略标记,而来自其它子流实例被用于检测真正的概念漂移随机策略标记。第二,我们引进的高证据为重点的战略不确定性的滑动窗口,找出一个实例是否是冲突的不确定性实例或不是。显然,我们的策略解决了有效使用数据证据的问题流设置,并在不断发展的数据训练的模型流可以选择比较有代表性的实例。最后,在四个基准数据集,与国家的最先进的主动学习策略比较实验,结果表明我们提出的方法的良好的预测性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号