Categorizing and Mining Concept Drifting Data Streams

机译：分类和挖掘概念漂移数据流

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Mining concept drifting data streams is a defining challenge for data mining research. Recent years have seen a large body of work on detecting changes and building prediction models from stream data, with a vague understanding on the types of the concept drifting and the impact of different types of concept drifting on the mining algorithms. In this paper, we first categorize concept drifting into two scenarios: Loose Concept Drifting (LCD) and Rigorous Concept Drifting (RCD), and then propose solutions to handle each of them separately. For LCD data streams, because concepts in adjacent data chunks are sufficiently close to each other, we apply kernel mean matching (KMM) method to minimize the discrepancy of the data chunks in the kernel space. Such a minimization process will produce weighted instances to build classifier ensemble and handle concept drifting data streams. For RCD data streams, because genuine concepts in adjacent data chunks may randomly and rapidly change, we propose a new Optimal Weights Adjustment (OWA) method to determine the optimum weight values for classifiers trained from the most recent (up-to-date) data chunk, such that those classifiers can form an accurate classifier ensemble to predict instances in the yet-to-come data chunk. Experiments on synthetic and real-world datasets will show that weighted instance approach is preferable when the concept drifting is mainly caused by the changing of the class prior probability; whereas the weighted classifier approach is preferable when the concept drifting is mainly triggered by the changing of the conditional probability.

机译：挖掘概念漂移数据流是数据挖掘研究面临的一项决定性挑战。近年来，在从流数据中检测变化和建立预测模型方面，已有大量工作开展，对概念漂移的类型以及不同类型的概念漂移对挖掘算法的影响尚不甚了解。在本文中，我们首先将概念漂移分为两种情况：宽松概念漂移（LCD）和严格概念漂移（RCD），然后提出解决方案以分别处理它们。对于LCD数据流，由于相邻数据块中的概念彼此足够接近，因此我们应用内核均值匹配（KMM）方法来最小化内核空间中数据块的差异。这样的最小化过程将产生加权实例，以建立分类器集合并处理概念漂移数据流。对于RCD数据流，由于相邻数据块中的真实概念可能会随机快速变化，因此我们提出了一种新的最佳权重调整（OWA）方法，以确定从最新（最新）数据中训练出来的分类器的最佳权重值块，这样这些分类器就可以形成一个准确的分类器集合，以预测尚未出现的数据块中的实例。在综合和真实数据集上进行的实验表明，当概念漂移主要是由类先验概率的变化引起的时，加权实例方法是更可取的。当概念漂移主要是由条件概率的变化触发时，加权分类器方法是更可取的。

著录项

来源
《ACMKDD International Conference on Knowledge Discovery and Data Mining;KDD 2008》|2008年|794-802|共9页
会议地点
作者
Peng Zhang; Xingquan Zhu; Yong Shi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息与知识传播;
关键词
classification; ensemble learning; data streams; concept drifting;

机译：分类;整体学习;数据流;概念漂移;

相似文献

外文文献
中文文献
专利

1. Most preferable combination of explicit drift detection approaches with different classifiers for mining concept drifting data streams [J] . International journal of data science . 2019,第3期

机译：具有不同分类器的显式漂移检测方法的最优选组合用于采矿概念漂移数据流
2. Bayesian Nonparametric Unsupervised Concept Drift Detection for Data Stream Mining [J] . Xuan Junyu, Lu Jie, Zhang Guangquan ACM transactions on intelligent systems and technology . 2021,第1期

机译：数据流挖掘的贝叶斯非参数无监督概念漂移检测
3. Real-time feature selection technique with concept drift detection using adaptive micro-clusters for data stream mining [J] . Hammoodi Mahmood Shakir, Stahl Frederic, Badii Atta Knowledge-Based Systems . 2018,第DECa1期

机译：实时特征选择技术，使用自适应微团簇进行概念漂移检测，用于数据流挖掘
4. Categorizing and mining concept drifting data streams [C] . Peng Zhang, Xingquan Zhu, Yong Shi ACM SIGKDD international conference on Knowledge discovery and data mining . 2008

机译：分类和挖掘概念漂移数据流
5. The GC3 framework grid density based clustering for classification of streaming data with concept drift. [D] . Sethi, Tegjyot Singh. 2013

机译：基于GC3框架网格密度的聚类，用于通过概念漂移对流数据进行分类。
6. Fast Adapting Ensemble: A New Algorithm for Mining Data Streams with Concept Drift [O] . Agustín Ortíz Díaz, José del Campo-Ávila, Gonzalo Ramos-Jiménez, 2015

机译：快速适应的集成体：一种使用概念漂移挖掘数据流的新算法
7. Data stream mining: methods and challenges for handling concept drift [O] . Scott Wares, John Isaacs, Eyad Elyan 2019

机译：数据流挖掘：处理概念漂移的方法和挑战
8. Time-based Data Streams: Fundamental Concepts For A Data Resource For Streams. [R] . Plale, B. A. 2009

机译：基于时间的数据流：流数据资源的基本概念。

Categorizing and Mining Concept Drifting Data Streams

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅