首页> 外文会议>Machine Learning and Data Mining in Pattern Recognition(MLDM 2007); 20070718-20; Leipzig(DE) >Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index
【24h】

Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index

机译:使用最长的公共子序列和双重匹配索引进行有效的子序列匹配

获取原文
获取原文并翻译 | 示例

摘要

The purpose of subsequence matching is to find a query sequence from a long data sequence. Due to the abundance of applications, many solutions have been proposed. Virtually all previous solutions use the Euclidean measure as the basis for measuring distance between sequences. Recent studies, however, suggest that the Euclidean distance often fails to produce proper results due to the irregularity in the data, which is not so uncommon in our problem domain. Addressing this problem, some non-Euclidean measures, such as Dynamic Time Warping (DTW) and Longest Common Subsequence (LCS), have been proposed. However, most of the previous work in this direction focused on the whole sequence matching problem where query and data sequences are the same length. In this paper, we propose a novel subsequence matching framework using a non-Euclidean measure, in particular, LCS, and a new index query scheme. The proposed framework is based on the Dual Match framework where data sequences are divided into a series of disjoint equi-length subsequences and then indexed in an R-tree. We introduced similarity bound for index matching with LCS. The proposed query matching scheme reduces significant numbers of false positives in the match result. Furthermore, we developed an algorithm to skip expensive LCS computations through observing the warping paths. We validated our framework through extensive experiments using 48 different time series datasets. The results of the experiments suggest that our approach significantly improves the subsequence matching performance in various metrics.
机译:子序列匹配的目的是从长数据序列中查找查询序列。由于大量的应用,已经提出了许多解决方案。几乎所有以前的解决方案都使用欧几里得度量作为测量序列之间距离的基础。然而,最近的研究表明,由于数据的不规则性,欧几里得距离通常无法产生适当的结果,这在我们的问题域中并不少见。为了解决这个问题,已经提出了一些非欧几里得度量,例如动态时间规整(DTW)和最长公共子序列(LCS)。但是,此方向上的大多数先前工作都集中在查询和数据序列长度相同的整个序列匹配问题上。在本文中,我们提出了一种使用非欧氏测度(尤其是LCS)的新型子序列匹配框架以及新的索引查询方案。提出的框架基于对偶匹配框架,其中数据序列被分为一系列不相交的等长子序列,然后在R树中进行索引。我们引入了与LCS进行索引匹配的相似性界限。所提出的查询匹配方案减少了匹配结果中的大量误报。此外,我们开发了一种算法,可通过观察翘曲路径来跳过昂贵的LCS计算。我们使用48个不同的时间序列数据集通过广泛的实验验证了我们的框架。实验结果表明,我们的方法可以显着改善各种指标下的子序列匹配性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号