...
首页> 外文期刊>Data mining and knowledge discovery >A bit level representation for time series data mining with shape based similarity
【24h】

A bit level representation for time series data mining with shape based similarity

机译:具有基于形状的相似性的时间序列数据挖掘的位级别表示

获取原文
获取原文并翻译 | 示例
           

摘要

Clipping is the process of transforming a real valued series into a sequence of bits representing whether each data is above or below the average. In this paper, we argue that clipping is a useful and flexible transformation for the exploratory analysis of large time dependent data sets. We demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that, under some assumptions, the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data. Unlike other transformations, clipped series can be compared directly to the raw data series. We show that this means we can form a tight lower bounding metric for Euclidean and Dynamic Time Warping distance and hence efficiently query by content. Clipped data can be used in conjunction with a host of algorithms and statistical tests that naturally follow from the binary nature of the data. A series of experiments illustrate how clipped series can be used in increasingly complex ways to achieve better results than other popular representations. The usefulness of the proposed representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with a Wavelet or Discrete Fourier Transformation at the same compression ratio for both clustering and query by content. The flexibility of the representation is shown by the fact that we can take advantage of a variable Run Length Encoding of clipped series to define an approximation of the Kolmogorov complexity and hence perform Kolmogorov based clustering.
机译:削波是将实值序列转换为表示每个数据是高于还是低于平均值的位序列的过程。在本文中,我们认为剪辑是对大型时间相关数据集进行探索性分析的有用且灵活的转换。我们演示了如何非常有效地压缩和操纵存储为位的时间序列,并且在某些假设下,限幅序列的判别力渐近等于原始数据所具有的判别力。与其他转换不同,裁剪后的序列可以直接与原始数据序列进行比较。我们表明,这意味着我们可以为欧几里得距离和动态时间规整距离形成一个严格的下限度量,从而有效地按内容进行查询。裁剪后的数据可以与大量算法和统计测试结合使用,这些算法和统计测试自然可以从数据的二进制性质中得出。一系列实验说明了如何以越来越复杂的方式使用剪切后的序列来获得比其他流行表示更好的结果。对于具有聚类和按内容查询的相同压缩率的情况,具有裁剪数据的结果始终优于通过小波或离散傅里叶变换获得的结果,这一事实证明了提出的表示的有效性。我们可以利用裁剪序列的可变游程长度编码来定义Kolmogorov复杂度的近似值,从而执行基于Kolmogorov的聚类,这一事实表明了表示的灵活性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号