Reference-Based Indexing of Sequence Databases

机译：基于参考的序列数据库索引

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of similarity search in a very large sequence database with edit distance as the similarity measure. Given limited main memory, our goal is to develop a reference-based index that reduces the number of costly edit distance computations in order to answer a query. The idea in reference-based indexing is to select a small set of reference sequences that serve as a surrogate for the other sequences in the database. We consider two novel strategies for selecting references as well as a new strategy for assigning references to database sequences. Our experimental results show that our selection and assignment methods far outperform competitive methods. For example, our methods prune up to 20 times as many sequences as the Omni method, and as many as 30 times as many sequences as frequency vectors. Our methods also scale nicely for databases containing many and/or very long sequences.

机译：我们认为在具有编辑距离的超大型序列数据库中的相似性搜索问题是相似性度量。给定有限的主内存，我们的目标是开发一种基于参考的索引，以减少为了回答查询而进行的昂贵的编辑距离计算的数量。基于引用的索引的思想是选择一小组参考序列，以用作数据库中其他序列的替代。我们考虑了两种选择引用的新颖策略，以及一种将引用分配给数据库序列的新策略。我们的实验结果表明，我们的选择和分配方法远胜于竞争方法。例如，我们的方法修剪的序列是Omni方法的多达20倍，而序列的修剪是频率向量的30倍。对于包含许多和/或非常长的序列的数据库，我们的方法也可以很好地扩展。

著录项

来源
《32nd International Conference on Very Large Data Bases(VLDB 2006) vol.2》|2006年|906-917|共12页
会议地点 Seoul(KR)
作者
Jayendra Venkateswaran; Deepak Lachwani; Tamer Kahveci; Christopher Jermaine;
展开▼
作者单位

CISE Department,University of Florida,Gainesville,FL 32611;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Indexing Musical Sequences in Large Datasets Using Relational Databases [J] . Aleksey Charapko, Ching-Hua Chuan International journal of multimedia data engineering & management . 2015,第2期

机译：使用关系数据库索引大型数据集中的音乐序列
2. Indexing for Large DNA Database sequences [J] . Mahmoud Saheb, Samer Wohoush International Journal of Biometric and Bioinformatics . 2011,第4期

机译：大型DNA数据库序列的索引
3. SPIND: a reference-based auto-indexing algorithm for sparse serial crystallography data [J] . Li C., Li X., Kirian R., IUCrJ . 2019,第1期

机译：SPIND：用于稀疏串行晶体学数据的基于参考的自动索引算法
4. Reference-Based Indexing of Sequence Databases [C] . Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci, International Conference on Very Large Data Bases . 2006

机译：基于参考的序列数据库索引
5. Indexing techniques for similarity searches in sequence databases [D] . Park, Sanghyun 2000

机译：序列数据库中相似搜索的索引技术
6. SPIND: a reference-based auto-indexing algorithm for sparse serial crystallography data [O] . Chufeng Li, Xuanxuan Li, Richard Kirian, 2019

机译：SPIND：稀疏串行晶体学数据的基于参考的自动索引算法
7. Reference-Based Alignment in Large Sequence Databases [O] . Panagiotis Papapetrou, Vassilis Athitsos, George Kollios, 2011

机译：大型序列数据库中基于参考的比对

Reference-Based Indexing of Sequence Databases

摘要

著录项

相似文献

相关主题

期刊订阅