首页> 外文期刊>Data mining and knowledge discovery >A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases
【24h】

A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

机译:一种在时间序列数据库中支持归一化转换的子序列匹配算法

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, an algorithm is proposed for subsequence matching that supports normalization transform in time-series databases. Normalization transform enables finding sequences with similar fluctuation patterns even though they are not close to each other before the normalization transform. Simple application of existing subsequence matching algorithms to support normalization transform is not feasible since the algorithms do not have information for normalization transform of subsequences of arbitrary lengths. Application of the existing whole matching algorithm supporting normalization transform to the subsequence matching is feasible, but requires an index for every possible length of the query sequence causing serious overhead on both storage space and update time. The proposed algorithm generates indexes only for a small number of different lengths of query sequences. For subsequence matching it selects the most appropriate index among them. Better search performance can be obtained by using more indexes. In this paper, the approach is called index interpolation. It is formally proved that the proposed algorithm does not cause false dismissal. The search performance can be traded off with storage space by adjusting the number of indexes. For performance evaluation, a series of experiments is conducted using the indexes for only five different lengths out of lengths 256~512 of the query sequence. The results show that the proposed algorithm outperforms the sequential scan by up to 2.4 times on the average when the selectivity of the query is 10~(-2) and up to 14.6 times when it is 10~(-5). Since the proposed algorithm performs better with smaller selectivities, it is suitable for practical situations, where the queries with smaller selectivities are much more frequent.
机译:在本文中,提出了一种算法,用于随后匹配支持时间序列数据库中的归一化变换的匹配。归一化变换使得能够找到具有类似波动模式的序列,即使它们在归一化变换之前没有彼此接近。简单地应用现有子匹配算法以支持归一化变换是不可行的,因为算法没有具有任意长度的后续变换的归一化转换的信息。在支持归一化转换到子匹配中的现有整个匹配算法的应用是可行的,但需要一个索引的查询序列的所有可能长度的索引,导致存储空间和更新时间的严重开销。该算法仅为少量不同的查询序列生成索引。对于后续匹配,它选择其中最合适的索引。可以通过使用更多索引来获得更好的搜索性能。在本文中,该方法称为索引插值。正式证明该算法不会导致错误解雇。通过调整索引的数量,可以使用存储空间进行搜索性能。对于性能评估,使用索引仅在查询序列的长度256〜512中仅为5个不同长度进行一系列实验。结果表明,当查询的选择性为10〜(2)时,所提出的算法在平均值上的平均值最高扫描速度高达2.4倍,当时为10〜(-5)。由于所提出的算法更好地利用较小的选择性来执行,因此适用于实际情况,其中具有较小选择性的疑问更频繁。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号