首页> 外文期刊>Evolving Systems >Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification
【24h】

Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification

机译:使用分类器池对数据流分类中的重复概念进行基于精度的跟踪

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Data streams have some unique properties which make them applicable in precise modeling of many real data mining applications. The most challenging property of data streams is the occurrence of "concept drift". Recurring concepts is a type of concept drift which can be seen in most of real world problems. Detecting recurring concepts makes it possible to exploit previous knowledge obtained in the learning process. This leads to quick adaptation of the learner whenever a concept reappears. In this paper, we propose a learning algorithm called Pool and Accuracy based Stream Classification with some variations, which takes the advantage of maintaining a pool of classifiers to track recurring concepts. Each classifier is used to describe an existing concept. Consecutive batches of instances are first classified by the pool of classifiers. Two approaches are presented for this task: active classifier and weighted classifiers methods. Then the true labels are revealed and the pool is updated at the end of the batch. Updating the pool is done using one of the following methods: exact Bayesian, Bayesian and Heuristic. As the algorithm may assign multiple classifiers to a single concept, a classifier merging process is used to resolve this problem. Experimental results on real and artificial datasets show the effectiveness of weighted classifiers method while dealing with sudden concept drifting datasets. In addition, the proposed updating methods outperform the existing algorithms in datasets with arbitrary attributes. Finally some performed experiments represent superiority of using merging process in large datasets.
机译:数据流具有一些独特的属性,使其可用于许多实际数据挖掘应用程序的精确建模。数据流最具挑战性的属性是“概念漂移”的发生。重复出现的概念是一种概念漂移,可以在大多数现实世界中看到。通过检测重复出现的概念,可以利用在学习过程中获得的先前知识。每当概念重新出现时,这都会导致学习者快速适应。在本文中,我们提出了一种称为“基于池和准确性的流分类”的学习算法,该算法具有一些变化,它利用维护分类器池来跟踪重复出现的概念的优势。每个分类器用于描述现有概念。连续的实例批次首先由分类器池分类。为此任务提供了两种方法:主动分类器和加权分类器方法。然后显示真实标签,并在批处理结束时更新池。使用以下方法之一完成池的更新:精确贝叶斯,贝叶斯和启发式。由于算法可能将多个分类器分配给一个概念,因此使用分类器合并过程来解决此问题。在真实和人工数据集上的实验结果表明,加权分类器方法在处理突发概念漂移数据集时是有效的。另外,所提出的更新方法优于具有任意属性的数据集中的现有算法。最后,进行的一些实验表明在大型数据集中使用合并过程的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号