首页> 外文学位 >Discovering interesting patterns and associations in data streams.
【24h】

Discovering interesting patterns and associations in data streams.

机译:在数据流中发现有趣的模式和关联。

获取原文
获取原文并翻译 | 示例

摘要

A data stream is a sequence of items that arrive in a timely order. Different from data in traditional static databases, data streams are continuous, unbounded, usually come with high speed, and have a data value distribution that often changes with time (Guha, 2001). As more applications such as web transactions, telephone records, and network flows generate a large number of data streams every day, efficient knowledge discovery of data streams is an active and growing research area in data mining with broad applications. Traditional data mining algorithms are developed to work on a complete static dataset and, thus, cannot be applied directly in data stream applications.;One area of data mining research is to mine association relationship in a data set. Most of association mining techniques for data streams can be categorized into two types: those developed based on frequent patterns and those developed based on closed patterns. Due to the number of frequent patterns are often huge and redundant, non-informative patterns are contained in frequent patterns. An alternative way is to develop the association mining approaches for data streaming applications based on closed patterns, which generally represent a small subset of all frequent patterns, but provide complete and condensed information. In these researches, the closed pattern mining is the prerequisite condition for non-redundant and informative association mining.;In this dissertation, a sliding window technique for dynamic mining of closed patterns in data streams is proposed, and an approach of mining non-redundant and informative associations based on the discovered closed patterns is developed. The closed pattern and relevant association mining techniques are selected research area in this dissertation. First, the closed patterns for a given collection of data are currently the most compact data knowledge that can provide complete support information for all data patterns. Compared with other techniques, the proposed closed pattern mining technique has potential to largely decrease the number of subsequent combinatorial calculations performed on the data patterns. Second, the memory requirement to store the closed patterns and relevant associations is generally lower than the corresponding frequent patterns and associations. In some data streaming applications, memory usage is an important measurement, because in these applications memory usage is the bottleneck for knowledge discovery. Third, the associations generated for data streams are the knowledge used to identify the relations within the data. The discovered relations can find their wide applications in many data streaming environments.;Different from the closed pattern mining techniques on traditional databases, which require multiple scans of the entire database, the proposed technique determines the closed patterns with a single scan. It is an incremental mining process; as the sliding window advances, new data transactions enter and old data transactions exit the window. But instead of regenerating closed patterns from the entire window, the proposed technique updates the old set of closed patterns whenever a new transaction arrives and/or an old transaction leaves the sliding window to obtain the current set of closed patterns. This incremental feature allows the user to get the most recent updated closed patterns without rescanning the entire updated database, which saves not only the computation time, but more importantly, the I/O operating time to load and write data from database to memory. Third, the proposed sliding window technique can handle both the insertion and deletion operations independently, which allows the user to adjust the sliding window size in different application environments. Furthermore, the proposed interesting patterns and association mining framework can handle different users' requests at the same time at their specified support and confidence thresholds, and interested input and output patterns.;The research includes both theoretical proofs of correctness for the proposed algorithms and simulation experiments to compare the proposed techniques with those existing in the literature using synthetic and real datasets. The utility of the proposed technique is applied to sensor network databases of a traffic management and an environmental monitoring site for missing data estimation purpose.
机译:数据流是按顺序到达的一系列项目。与传统静态数据库中的数据不同,数据流是连续的,无边界的,通常具有很高的速度,并且数据值的分布经常随时间而变化(Guha,2001)。随着越来越多的应用程序(例如Web事务,电话记录和网络流)每天生成大量数据流,有效的数据流知识发现已成为具有广泛应用程序的数据挖掘中活跃且不断发展的研究领域。传统的数据挖掘算法是为在完整的静态数据集上工作而开发的,因此不能直接应用于数据流应用程序。数据挖掘研究的一个领域是挖掘数据集中的关联关系。数据流的大多数关联挖掘技术可以分为两种类型:基于频繁模式开发的技术和基于封闭模式开发的技术。由于频繁模式的数量通常庞大且多余,因此非信息模式包含在频繁模式中。一种替代方法是为基于封闭模式的数据流应用程序开发关联挖掘方法,该模式通常代表所有频繁模式的一小部分,但提供完整的压缩信息。在这些研究中,封闭模式挖掘是进行非冗余和信息丰富的关联挖掘的前提条件。本论文提出了一种滑动窗口技术,用于数据流中封闭模式的动态挖掘,并提出了一种非冗余的挖掘方法。并基于发现的封闭模式开发了信息关联。本文选择了封闭模式和相关的关联挖掘技术作为研究领域。首先,给定数据集的封闭模式是目前最紧凑的数据知识,可以为所有数据模式提供完整的支持信息。与其他技术相比,所提出的封闭模式挖掘技术具有极大地减少对数据模式进行后续组合计算的数量的潜力。其次,存储封闭模式和相关关联的内存需求通常低于相应的频繁模式和关联。在某些数据流应用程序中,内存使用率是一项重要的指标,因为在这些应用程序中,内存使用率是知识发现的瓶颈。第三,为数据流生成的关联是用于识别数据内关系的知识。发现的关系可以在许多数据流环境中找到它们的广泛应用。与传统数据库上的封闭模式挖掘技术不同,后者需要对整个数据库进行多次扫描,所提出的技术只需一次扫描即可确定封闭模式。这是一个增量的采矿过程;随着滑动窗口的前进,新数据事务进入而旧数据事务退出该窗口。但是,代替从整个窗口重新生成关闭模式,所提出的技术每当新事务到达和/或旧事务离开滑动窗口时就更新旧的关闭模式集以获得当前的关闭模式集。此增量功能使用户无需重新扫描整个更新的数据库即可获取最新的更新的关闭模式,这不仅节省了计算时间,而且更重要的是,节省了从数据库加载数据并将数据写入内存的I / O操作时间。第三,提出的滑动窗口技术可以独立处理插入和删除操作,这允许用户在不同的应用环境中调整滑动窗口的大小。此外,提出的有趣模式和关联挖掘框架可以在指定的支持和置信度阈值以及感兴趣的输入和输出模式下同时处理不同用户的请求。;研究包括所提出算法的正确性的理论证明和仿真实验,以使用合成和真实数据集将拟议技术与文献中现有技术进行比较。所提出的技术的实用性被应用于交通管理和环境监测站点的传感器网络数据库,以用于丢失数据估计的目的。

著录项

  • 作者

    Jiang, Nan.;

  • 作者单位

    The University of Oklahoma.;

  • 授予单位 The University of Oklahoma.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 167 p.
  • 总页数 167
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号