首页> 外文OA文献 >Active Learning for Data Streams under Concept Drift and concept evolution.
【2h】

Active Learning for Data Streams under Concept Drift and concept evolution.

机译:在“概念漂移”和“概念演变”下主动学习数据流。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Data streams classification is an important problem however, poses many challenges. Since the length of the data is theoretically infinite, it is impractical to store andudprocess all the historical data. Data streams also experience change of its underlying dis-tribution (concept drift), thus the classifier must adapt. Another challenge of data stream classification is the possible emergence and disappearance of classes which is known as (concept evolution) problem. On the top of these challenges, acquiring labels with such large data is expensive. In this paper, we propose a stream-based active learning (AL) strategy (SAL) that handles the aforementioned challenges. SAL aims at querying the labels of samples which results in optimizing the expected future error. It handlesudconcept drift and concept evolution by adapting to the change in the stream. Furthermore, as a part of the error reduction process, SAL handles the sampling bias problem and queries the samples that caused the change i.e., drifted samples or samples coming from new classes. To tackle the lack of prior knowledge about the streaming data, non-parametric Bayesian modelling is adopted namely the two representations of Dirichlet process; Dirichlet mixture models and stick breaking process. Empirical results obtained on real-world benchmarks show the high performance of the proposed SAL method compared to the state-of-the-art methods.
机译:数据流分类是一个重要的问题,但是提出了许多挑战。由于数据的长度在理论上是无限的,因此存储和处理所有历史数据是不切实际的。数据流还经历了其基础分布的变化(概念漂移),因此分类器必须适应。数据流分类的另一个挑战是类的可能出现和消失,这被称为(概念演化)问题。在这些挑战中,获取具有如此大数据的标签非常昂贵。在本文中,我们提出了一种基于流的主动学习(AL)策略(SAL),该策略可以应对上述挑战。 SAL旨在查询样本标签,从而优化预期的未来误差。它通过适应流中的变化来处理 udconcept漂移和概念演变。此外,作为减少误差过程的一部分,SAL处理采样偏差问题并查询引起变化的样本,即漂移的样本或来自新类别的样本。为了解决关于流数据的先验知识的缺乏,采用了非参数贝叶斯建模,即Dirichlet过程的两种表示形式。 Dirichlet混合物模型和粘杆断裂过程。在实际基准测试中获得的经验结果表明,与最新方法相比,该方法具有更高的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号