Efficient processing of similarity search under time warping in sequence databases: an index-based approach

Sang-Wook Kim; Sanghyun Park; Wesley W. Chu

首页> 外文期刊>Information Systems >Efficient processing of similarity search under time warping in sequence databases: an index-based approach

【24h】

Efficient processing of similarity search under time warping in sequence databases: an index-based approach

机译：时间扭曲下序列数据库中相似搜索的有效处理：一种基于索引的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper discusses the effective processing of similarity search that supports time warping in large sequence databases. Time warping enables sequences with similar patterns to be found even when they are of different lengths. Prior methods for processing similarity search that supports time warping failed to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan the entire database, thus suffering from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size. In this paper, we propose a novel method for similarity search that supports time warping. Our primary goal is to enhance the search performance in large databases without permitting any false dismissal. To attain this goal, we have devised a new distance function, D_(tw-lb), which consistently underestimates the time warping distance and satisfies the triangular inequality. D_(tw-lb) uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For the efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes, and D_(tw-lb) as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we have performed extensive experiments. The results reveal that our method achieves a significant improvement in speed up to 43 times faster with a data set containing real-world S&P 500 stock data sequences, and up to 720 times with data sets containing a very large volume of synthetic data sequences. The performance gain increases: (1) as the number of data sequences increases, (2) the average length of data sequences increases, and (3) as the tolerance in a query decreases. Considering the characteristics of real databases, these tendencies imply that our approach is suitable for practical applications.

机译：本文讨论了支持大序列数据库中时间扭曲的相似性搜索的有效处理。通过时间扭曲，即使长度不同，也可以找到具有相似模式的序列。支持时间扭曲的用于处理相似性搜索的现有方法未能采用多维索引而没有错误消除，因为时间扭曲距离不满足三角形不等式。他们必须扫描整个数据库，从而使大型数据库的性能严重下降。租用后缀树的另一种方法（不假定任何距离函数）由于树大而性能也不佳。在本文中，我们提出了一种支持时间扭曲的相似性搜索新方法。我们的主要目标是提高大型数据库的搜索性能，而又不容许任何错误的辞退。为了实现此目标，我们设计了一个新的距离函数D_（tw-lb），该函数始终低估了时间扭曲距离并满足了三角不等式。 D_（tw-lb）使用一个四元组特征向量，该向量从每个序列中提取，并且对于时间扭曲是不变的。为了有效地处理相似性搜索，我们使用了多维索引，该多维索引使用4元组特征向量作为索引属性，而D_（tw-lb）作为距离函数。我们证明我们的方法不会引起错误的解雇。为了验证我们方法的优越性，我们进行了广泛的实验。结果表明，对于包含真实标准普尔500股票数据序列的数据集，我们的方法的速度提高了多达43倍，对于包含大量合成数据序列的数据集，速度提高了720倍。性能增益增加：（1）随着数据序列数量的增加，（2）数据序列的平均长度增加，以及（3）随着查询容忍度的降低。考虑到实际数据库的特性，这些趋势表明我们的方法适用于实际应用。

著录项

来源
《Information Systems》 |2004年第5期|p.405-420|共16页
作者
Sang-Wook Kim; Sanghyun Park; Wesley W. Chu;
展开▼
作者单位

College of Information and Communications, Hanyang University, 17 Haengdang, Seongdong, Seoul 133-791, South Korea;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
similarity search; sequence database; indexing; time warping distance;

机译：相似度搜索序列数据库索引时间扭曲距离;
入库时间 2022-08-18 02:48:13

相似文献

外文文献
中文文献
专利

1. Similarity search of time-warped subsequences via a suffix tree [J] . Sanghyun Park, Wesley W. Chu, Jeehee Yoon, Information Systems . 2003,第7期

机译：通过后缀树对时间扭曲的子序列进行相似性搜索
2. Piers: An Efficient Model for Similarity Search in DNA Sequence Databases [J] . Xia Cao, Shuai Cheng Li, Beng Chin Ooi, SIGMOD record . 2004,第2期

机译：Piers：DNA序列数据库中相似搜索的有效模型
3. EFFICIENT SIMILARITY SEARCH FOR MULTI-DIMENSIONAL TIME SEQUENCES [J] . SANGJUN LEE JISOOK PARK International Journal of Wavelets, Multiresolution and Information Processing . 2010,第3期

机译：多维时间序列的有效相似度搜索
4. An index-based approach for similarity search supporting time warping in large sequence databases [C] . Sang-Wook Kim, Sanghyun Park, Chu, . 2001

机译：基于索引的相似性搜索方法，支持大序列数据库中的时间扭曲
5. Sequence and structure similarity search in biological and XML databases. [D] . Aghili, S. Alireza. 2005

机译：生物和XML数据库中的序列和结构相似性搜索。
6. A new method to analyze protein sequence similarity using Dynamic Time Warping [O] . Wenbing Hou, Qiuhui Pan, Qianying Peng, -1

机译：动态时间规整分析蛋白质序列相似性的新方法
7. Efficient Processing of Similarity Search Under Time Warping in Sequence Databases: An Index-Based Approach [O] . Sang-wook Kim, Sanghyun Park, Wesley W. Chu 2004

机译：时间扭曲下序列数据库中相似搜索的高效处理：基于索引的方法

Efficient processing of similarity search under time warping in sequence databases: an index-based approach

摘要

著录项

相似文献

相关主题

期刊订阅