首页> 外文OA文献 >Mining Massive-Scale Time Series Data using Hashing
【2h】

Mining Massive-Scale Time Series Data using Hashing

机译:使用散列挖掘大规模时间序列数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widelyused similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted.However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient andapproximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes whichalign (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Empirical results on two large-scalebenchmark time series data show that our proposed method prunes around 95% time series candidates and can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.
机译:在大型数据驱动的应用程序中,按时间序列进行相似性搜索是一项常见的操作。复杂的相似性度量标准是时间序列匹配的标准,因为它们通常未对齐。动态时间规整或DTW是时间序列中使用最广泛的相似性度量,因为它同时结合了对齐和匹配。但是,对齐会使DTW变慢。为了加快使用DTW进行昂贵的相似性搜索的速度,我们采用了基于分支和边界的修剪策略,但是,基于分支和边界的修剪仅对非常短的查询(低维时间序列)有用,而对于较长的查询,边界非常弱。由于边界松散,边界修剪策略归结为蛮力搜索。为了规避此问题,我们设计了SSH(Sketch,Shingle和Hashing),这是一种高效且近似的哈希方案,它比最新的分支和绑定搜索技术(UCR套件)要快得多。 SSH结合了素描,平铺和散列技术的新颖组合来产生(概率)索引,这些索引与DTW相似性度量(非常接近)对齐。然后将生成的索引用于创建哈希桶以进行亚线性搜索。对两个大型基准时间序列数据的经验结果表明,我们提出的方法修剪了约95%的时间序列候选者,并且可以比最新软件包(UCR套件)快20倍左右,而准确性没有任何显着下降。

著录项

  • 作者

    Luo Chen;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号