TS-stream: clustering time series on data streams

Cassio M. M. Pereira; Rodrigo F. de Mello

首页> 外文期刊>Journal of Intelligent Information Systems >TS-stream: clustering time series on data streams

【24h】

TS-stream: clustering time series on data streams

机译：TS流：将数据流上的时间序列聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The current ability to produce massive amounts of data and the impossibility in storing it motivated the development of data stream mining strategies. Despite the proposal of many techniques, this research area still lacks in approaches to mine data streams composed of multiple time series, which has applications in finance, medicine and science. Most of the current techniques for clustering streaming time series have a serious limitation in their similarity measure, which are based on the Pearson correlation. In this paper, we show the Pearson correlation is not capable of detecting similarities even for classic time series models, such as those by Box and Jenkins. This limitation motivated our proposal to cluster streaming time series based on their generating functions, which is achieved by considering features obtained using descriptive measures, such as Auto Mutual Information, the Hurst Exponent and several others. We present a new tree-based clustering algorithm, entitled TS-Stream, which uses the extracted features to produce partitions in better accordance to the time series generating functions. Experiments with synthetic data sets confirm TS-Stream outperforms ODAC, currently the most popular technique, in terms of clustering quality. Using real financial time series from the NYSE and NASDAQ, we conducted stock trading simulations employing TS-Stream to support the creation of diversified investment portfolios. Results confirmed TS-Stream increased the monetary returns in several orders of magnitude when compared to trading strategies simply based on the Moving Average Convergence Divergence financial indicator.

机译：当前产生大量数据的能力以及无法存储数据的能力激发了数据流挖掘策略的发展。尽管提出了许多技术建议，但该研究领域仍然缺乏用于挖掘由多个时间序列组成的数据流的方法，该方法在金融，医学和科学中都有应用。目前，大多数基于流时间序列进行聚类的技术在基于Pearson相关性的相似性度量中都存在严重的局限性。在本文中，我们证明了即使对于经典的时间序列模型（例如Box和Jenkins的模型），Pearson相关性也无法检测相似性。这种局限性促使我们建议基于流时间序列的生成函数对流时间序列进行聚类，这是通过考虑使用描述性度量（例如自动互信息，Hurst指数等）获得的特征来实现的。我们提出了一种名为TS-Stream的基于树的新聚类算法，该算法使用提取的特征来更好地根据时间序列生成函数生成分区。综合数据集的实验证实，在聚类质量方面，TS-Stream优于ODAC（目前最流行的技术）。使用来自纽约证券交易所和纳斯达克的真实财务时间序列，我们使用TS-Stream进行了股票交易模拟，以支持创建多元化的投资组合。结果证实，与仅基于移动平均趋同散度财务指标的交易策略相比，TS-Stream的货币收益提高了几个数量级。

著录项

来源
《Journal of Intelligent Information Systems》 |2014年第3期|531-566|共36页
作者
Cassio M. M. Pereira; Rodrigo F. de Mello;
展开▼
作者单位

Institute of Mathematical and Computer Sciences-ICMC-USP, University of Sao Paulo, Av. Trabalhador sao-carlense, Sao Carlos 400 13566-590, SP, Brazil;

Institute of Mathematical and Computer Sciences-ICMC-USP, University of Sao Paulo, Av. Trabalhador sao-carlense, Sao Carlos 400 13566-590, SP, Brazil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data streams; Clustering; Time series; Decision trees;

机译：数据流;集群;时间序列;决策树;

相似文献

外文文献
中文文献
专利

1. Swarms on the 3-sphere for online clustering of multivariate time series and data streams [J] . Aladin Crnkic, Igor Ivanovic, Vladimir Jacimovic, Future generation computer systems . 2020,第Nova期

机译：3范式的群体用于在线聚类多变量时间序列和数据流
2. Adaptive Fuzzy Clustering of Short Time Series with Unevenly Distributed Observations in Data Stream Mining Tasks [J] . Yevgeniy Bodyanskiy, Olena Vynokurova, Ilya Kobylin, Information Technology and Management Science . 2016,第1期

机译：数据流挖掘任务中具有不均匀分布观测值的短时间序列的自适应模糊聚类
3. Hierarchical Clustering of Time-Series Data Streams [J] . Rodrigues Pedro Pereira, Gama João, Pedroso Joao IEEE Transactions on Knowledge and Data Engineering . 2008,第5期

机译：时间序列数据流的分层聚类
4. Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data [C] . Rakthanmanon Thanawin, Keogh Eamonn J., Lonardi Stefano, 11th IEEE International Conference on Data Mining . 2011

机译：时间序列理论：将时间序列流聚类需要忽略一些数据
5. Statistical Modeling of Carbon Dioxide and Cluster Analysis of Time Dependent Information: Lag Target Time Series Clustering, Multi-Factor Time Series Clustering, and Multi-Level Time Series Clustering [D] . Kim, Doo Young. 2016

机译：二氧化碳的统计建模和时间相关信息的聚类分析：滞后目标时间序列聚类，多因素时间序列聚类和多级时间序列聚类
6. Whole Time Series Data Streams Clustering: Dynamic Profiling of the Electricity Consumption [O] . Krzysztof Gajowniczek, Marcin Bator, Tomasz Ząbkowski 2020

机译：整个时间序列数据流聚类：电力消耗的动态分析
7. Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data [O] . Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, 2012

机译：时间序列Epenthesis：聚类时间序列流需要忽略某些数据

TS-stream: clustering time series on data streams

摘要

著录项

相似文献

相关主题

期刊订阅