首页> 外文期刊>International Journal of Data Warehousing and Mining >Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams
【24h】

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

机译:用于发现多个数据流中频繁子序列的增量算法

获取原文
获取原文并翻译 | 示例
       

摘要

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.
机译:近年来,出现了产生数据流的新应用程序,例如库存数据和传感器网络。因此,在数据流中查找频繁的子序列或子序列簇是数据挖掘中的基本任务。数据流本质上是连续的,大小不受限制,并且到达率很高。由于这些特性,传统的聚类算法无法有效地找到数据流中的聚类。因此,提出了一种有效的增量算法来查找多个数据流中的频繁子序列。用于发现频繁子序列的所述方法是通过对数据流的子序列进行聚类。所提出的算法使用窗口模型来缓冲连续数据流。此外,它不会在每个窗口重新计算整个数据流的聚类结果,而是在先前窗口的聚类结果基础上构建。所提出的方法还对每个发现的簇使用一个衰减值,以确定何时删除旧簇并保留最近的簇。另外,该算法是高效的,因为它一次扫描数据流,并且由于频繁的子序列已在每个窗口的末尾准备好,因此被认为是随时算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号