Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

THANAWIN RAKTHANMANON; BILSON CAMPANA; ABDULLAH MUEEN; GUSTAVO BATISTA; BRANDON WESTOVER; QIANG ZHU; JESIN ZAKARIA; EAMONN KEOGH

首页> 外文期刊>ACM transactions on knowledge discovery from data >Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

【24h】

Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

机译：解决大数据时间序列：动态时间规整下挖掘数千个时间序列子序列

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms, including classification, clustering, motif discovery, anomaly detection, and so on. The difficulty of scaling a search to large datasets explains to a great extent why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine massive time series for the first time. We demonstrate the following unintuitive fact: in large datasets we can exactly search under Dynamic Time Warping (DTW) much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We explain how our ideas allow us to solve higher-level time series data mining problems such as motif discovery and clustering at scales that would otherwise be untenable. Moreover, we show how our ideas allow us to efficiently support the uniform scaling distance measure, a measure whose utility seems to be underappreciated, but which we demonstrate here. In addition to mining massive datasets with up to one trillion datapoints, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

机译：大多数时间序列数据挖掘算法都使用相似性搜索作为核心子例程，因此，相似性搜索所花费的时间实际上是所有时间序列数据挖掘算法（包括分类，聚类，主题发现，异常检测等）的瓶颈。将搜索规模扩大到大型数据集的困难在很大程度上解释了为什么大多数有关时间序列数据挖掘的学术研究在考虑数百万个时间序列对象时却停滞不前，而许多工业和科学都在等待数十亿个时间序列对象被探索。在这项工作中，我们表明通过结合使用四个新颖的思想，我们可以首次搜索和挖掘大量时间序列。我们证明了以下不直观的事实：在大型数据集中，与当前最新的欧几里德距离搜索算法相比，我们可以在动态时间规整（DTW）下精确地进行搜索。我们展示了我们尝试过的最大的时间序列实验集上的工作。特别是，我们考虑的最大数据集大于有史以来发表的所有数据挖掘论文中考虑的所有时间序列数据集的总和。我们解释了我们的想法如何使我们能够解决更高级别的时间序列数据挖掘问题，例如主题发现和聚类，而这些问题在其他情况下是站不住脚的。此外，我们展示了我们的想法如何使我们有效地支持统一的缩放距离度量，该度量的效用似乎未得到充分认可，但我们在此处进行了演示。除了挖掘具有多达一万亿个数据点的海量数据集之外，我们还将证明我们的想法还对数据流的实时监控产生了影响，使我们能够处理比以前更快的到达速度和/或使用更便宜，功耗更低的设备。目前可能。

著录项

来源
《ACM transactions on knowledge discovery from data》 |2013年第3期|10.1-10.31|共31页
作者
THANAWIN RAKTHANMANON; BILSON CAMPANA; ABDULLAH MUEEN; GUSTAVO BATISTA; BRANDON WESTOVER; QIANG ZHU; JESIN ZAKARIA; EAMONN KEOGH;
展开▼
作者单位

University of California Riverside and Kasetsart University,Department of Computer Engineering, Kasetsart University, Thailand;

Department of Computer Science and Engineering, University of California Riverside;

Department of Computer Science and Engineering, University of California Riverside;

Institute de Ciencias Matematicas e de Computapa, University of Sao Paulo;

Brigham and Women's Hospital;

Department of Computer Science and Engineering, University of California Riverside;

Department of Computer Science and Engineering, University of California Riverside;

Department of Computer Science and Engineering, University of California Riverside;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Time series; similarity search; lower bounds;

机译：时间序列;相似度搜索下界;

相似文献

外文文献
中文文献
专利

1. Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping [J] . Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, SIGKDD explorations . 2012,第CDaROM期

机译：动态时间规整下时间序列子序列的搜索与挖掘
2. Subsequence matching under time warping in time-series databases: observation, optimization, and performance results [J] . Sang-Wook Kim, Miyoung Shin International Journal of Computer Systems Science & Engineering . 2008,第1期

机译：时间序列数据库中时间扭曲下的子序列匹配：观察，优化和性能结果
3. Optimizing dynamic time warping's window width for time series data mining applications [J] . Hoang Anh Dau, Silva Diego Furtado, Petitjean Francois, Data mining and knowledge discovery . 2018,第4期

机译：优化动态时间翘曲的窗口宽度，用于时间序列数据挖掘应用程序
4. Data Mining a Trillion Time Series Subsequences Under Dynamic Time Warping [C] . Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Proceedings of the Twenty-Third international joint conference on artificial intelligence . 2013

机译：动态时间规整下的数万亿个时间序列子序列数据挖掘
5. Improving efficiency and effectiveness of dynamic time warping in large time series databases. [D] . Ratanamahatana, Chotirat. 2005

机译：提高大型时间序列数据库中动态时间规整的效率和有效性。
6. Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping [O] . THANAWIN RAKTHANMANON, BILSON CAMPANA, ABDULLAH MUEEN, -1

机译：解决大数据时间序列：动态时间规整下挖掘数千个时间序列子序列
7. Searching and mining trillions of time series subsequences under dynamic time warping [O] . Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, 2012

机译：在动态时间扭曲下搜索和挖掘数万亿个时间序列子序列

Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅