首页> 外文期刊>Knowledge and information systems >Modeling recurring concepts in data streams: a graph-based framework
【24h】

Modeling recurring concepts in data streams: a graph-based framework

机译:在数据流中建模重复概念:基于图形的框架

获取原文
获取原文并翻译 | 示例
           

摘要

Classifying a stream of non-stationary data with recurrent drift is a challenging task and has been considered as an interesting problem in recent years. All of the existing approaches handling recurrent concepts maintain a pool of concepts/classifiers and use that pool for future classifications to reduce the error on classifying the instances from a recurring concept. However, the number of classifiers in the pool usually grows very fast as the accurate detection of an underlying concept is a challenging task in itself. Thus, there may be many concepts in the pool representing the same underlying concept. This paper proposes the GraphPool framework that refines the pool of concepts by applying a merging mechanism whenever necessary: after receiving a new batch of data, we extract a concept representation from the current batch considering the correlation among features. Then, we compare the current batch representation to the concept representations in the pool using a statistical multivariate likelihood test. If more than one concept is similar to the current batch, all the corresponding concepts will be merged. GraphPool not only keeps the concepts but also maintains the transition among concepts via a first-order Markov chain. The current state is maintained at all times and new instances are predicted based on that. Keeping these transitions helps to quickly recover from drifts in some real-world problems with periodic behavior. Comprehensive experimental results of the framework on synthetic and real-world data show the effectiveness of the framework in terms of performance and pool management.
机译:分类与经常性漂移的非静止数据流是一个具有挑战性的任务,近年来被认为是一个有趣的问题。处理重复概念的所有现有方法维护概念/分类器的池,并使用该池用于将来的分类,以减少对从重复概念进行分类的错误。然而,由于对潜在概念的准确检测本身是一个具有挑战性的任务,因此池中的分类器的数量通常很快。因此,池中可能存在许多代表相同的底层概念的概念。本文提出了通过在必要时应用合并机制来改进概念池的GraphPool框架:在收到新的数据批次之后,我们考虑到特征之间的相关性,从当前批处理中提取概念表示。然后,我们使用统计多变量似然测试将当前批量表示与池中的概念表示进行比较。如果多个概念类似于当前批次,则所有相应的概念都将合并。 GraphPool不仅保留了概念,还通过一阶马尔可夫链保持概念之间的转换。当前状态在所有时间保持维护,基于该时预测新实例。保持这些过渡有助于在周期性行为中快速从一些真实问题的漂移中恢复。综合性和现实世界数据框架的综合实验结果表明了在性能和池管理方面的框架的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号