...
首页> 外文期刊>The VLDB journal >Pattern discovery in data streams under the time warping distance
【24h】

Pattern discovery in data streams under the time warping distance

机译:时间扭曲距离下数据流中的模式发现

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Subsequence matching is a basic problem in the field of data stream mining. In recent years, there has been significant research effort spent on efficiently finding subsequences similar to a query sequence. Another challenging issue in relation to subsequence matching is how we identify common local patterns when both sequences are evolving. This problem arises in trend detection, clustering, and outlier detection. Dynamic time warping (DTW) is often used for subsequence matching and is a powerful similarity measure. However, the straightforward method using DTW incurs a high computation cost for this problem. In this paper, we propose a one-pass algorithm, CrossMatch, that achieves the above goal. CrossMatch addresses two important challenges: (1) how can we identify common local patterns efficiently without any omission? (2) how can we find common local patterns in data stream processing? To tackle these challenges, CrossMatch incorporates three ideas: (1) a scoring function, which computes the DTW distance indirectly to reduce the computation cost, (2) a position matrix, which stores starting positions to keep track of common local patterns in a streaming fashion, and (3) a streaming algorithm, which identifies common local patterns efficiently and outputs them on the fly. We provide a theoretical analysis and prove that our algorithm does not sacrifice accuracy. Our experimental evaluation and case studies show that Cross-Match can incrementally discover common local patterns in data streams within constant time (per update) and space.
机译:子序列匹配是数据流挖掘领域中的一个基本问题。近年来,在有效查找与查询序列相似的子序列上花费了大量的研究工作。与子序列匹配有关的另一个具有挑战性的问题是,当两个序列都在进化时,我们如何识别共同的局部模式。在趋势检测,聚类和离群值检测中会出现此问题。动态时间规整(DTW)通常用于子序列匹配,并且是一种强大的相似性度量。但是,使用DTW的直接方法会为此问题带来很高的计算成本。在本文中,我们提出一种单遍算法CrossMatch,该算法可以实现上述目标。 CrossMatch解决了两个重要的挑战:(1)如何有效地识别常见的本地模式而不遗漏? (2)如何在数据流处理中找到常见的局部模式?为应对这些挑战,CrossMatch结合了三个思路:(1)计分函数,可间接计算DTW距离以降低计算成本;(2)位置矩阵,可存储起始位置以跟踪流中常见的本地模式(3)一种流算法,该算法可有效识别常见的本地模式并即时输出它们。我们提供了理论分析,并证明了我们的算法没有牺牲准确性。我们的实验评估和案例研究表明,交叉匹配可以在恒定的时间(每次更新)和空间内逐步发现数据流中的常见局部模式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号