...
首页> 外文期刊>SIGKDD explorations >Adaptive Learning and Mining for Data Streams and Frequent Patterns
【24h】

Adaptive Learning and Mining for Data Streams and Frequent Patterns

机译:数据流和频繁模式的自适应学习和挖掘

获取原文
获取原文并翻译 | 示例
           

摘要

This thesis is devoted to the design of data mining algorithms for evolving data streams and for the extraction of closed frequent trees. First, we deal with each of these tasks separately, and then we deal with them together, developing classification methods for data streams containing items that are trees. In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time. Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data. Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naive Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks. Trees are connected acyclic graphs and they are studied as link-based structures in many cases. In the second part of this thesis, we describe a rather formal study of trees from the point of view of closure-based mining. Moreover, we present efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees. We include an analysis of the extraction of association rules of full confidence out of the closed sets of trees, and we have found there an interesting phenomenon: rules whose propositional counterpart is nontrivial are, however, always implicitly true in trees due to the peculiar combinatorics of the structures. And finally, using these results on evolving data streams mining and closed frequent tree mining, we present high performance algorithms for mining closed unlabeled rooted trees adaptively from data streams that change over time. We introduce a general methodology to identify closed patterns in a data stream, using Galois Lattice Theory. Using this methodology, we then develop an incremental one, a sliding-window based one, and finally one that mines closed trees adaptively from data streams. We use these methods to develop classification methods for tree data streams.
机译:本文致力于数据挖掘算法的设计,以发展数据流并提取封闭的频繁树。首先,我们分别处理每个任务,然后一起处理它们,为包含树项的数据流开发分类方法。在数据流模型中,数据以高速到达,并且必须处理它们的算法具有非常严格的空间和时间约束。在本文的第一部分中,我们提出并举例说明了开发算法的框架,该算法可以从随时间变化的数据流中自适应学习。我们的方法基于在正确的位置使用变化检测器和估算器模块。我们提出了一种自适应滑动窗口算法ADWIN,用于检测变化并保留数据流中的更新统计信息,并将其用作黑盒,或者在最初不是为漂移数据而设计的算法中用作计数器或累加器。由于ADWIN具有严格的性能保证,因此为将此类保证扩展到学习和挖掘算法提供了可能性。我们使用朴素贝叶斯(Naive Bayes),聚类,决策树和集成方法等几种学习方法来测试我们的方法。我们基于类似于WEKA的MOA框架,构建了一个具有概念漂移的数据流挖掘实验框架,从而使研究人员可以轻松地运行实验数据流基准测试。树是连接在一起的无环图,在许多情况下,它们被研究为基于链接的结构。在本文的第二部分中,我们从基于封闭的采矿的角度描述了对树木的相当正式的研究。此外,我们提出了用于子树测试以及挖掘有序和无序频繁关闭树的有效算法。我们对从封闭的树集合中提取完全置信的关联规则进行了分析,我们发现了一个有趣的现象:命题对应项不平凡的规则在树中总是隐含地存在,这是由于特殊的组合结构。最后,将这些结果用于不断发展的数据流挖掘和封闭频繁树挖掘中,我们提出了一种高性能算法,可以从随时间变化的数据流中自适应地挖掘封闭的未标记的有根树。我们使用伽罗瓦格子理论介绍​​一种通用方法来识别数据流中的闭合模式。然后,使用这种方法,我们开发了一种增量式方法,一种基于滑动窗口的方法,最后一种方法是从数据流中自适应地挖掘封闭树。我们使用这些方法来开发树数据流的分类方法。

著录项

  • 来源
    《SIGKDD explorations》 |2009年第1期|共3页
  • 作者

    Albert Bifet;

  • 作者单位
  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 TP274.2;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号