首页> 外文会议>IEEE International Conference on Data Mining >Fast and Flexible Multivariate Time Series Subsequence Search
【24h】

Fast and Flexible Multivariate Time Series Subsequence Search

机译:快速灵活的多变量时间序列后续搜索

获取原文

摘要

Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem #x2014; (1) an R*-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.
机译:多变量时间序列(MTS)是普遍存在的,并且在航空航天系统,音乐和视频流,医学监测和金融系统中的传感器录制等区域生成。域专家通常对搜索来自这些MTS数据库的有趣多元模式感兴趣,这些数据库可以包含多千兆字节的数据。令人惊讶的是,对MTS搜索的研究非常有限。大多数现有工作仅支持具有相同数据长度的查询,或者在固定的变量集上查询。在本文中,我们为大规模MTS数据库提出了一种有效和灵活的子序列搜索框架,即首次启用对它们之间的任意时间延迟的任何变量子集进行查询。我们提出了两种可怕的正确算法来解决这个问题#x2014; (1)基于R * -tree的搜索(RB),它使用最小边界矩形(MBR)来组织子序列,以及(2)基于列表的搜索(LBS)算法,其使用排序列表进行索引。我们展示了使用来自航空域的两个大型MTS数据库的这些算法的性能,每个算法包括几百万观测。这两个测试都表明,我们的算法具有非常高的剪枝率(> 95%),从而需要实际磁盘访问仅少于5%的观察。据我们所知,这是第一个能够在任何变量子集上搜索后续搜索的第一个灵活的MTS搜索算法。此外,MTS子序列搜索从未尝试过我们本文中使用的大小的数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号