首页> 外文会议> >Effective indexing and filtering for similarity search in large biosequence databases

【24h】

Effective indexing and filtering for similarity search in large biosequence databases

机译：有效索引和过滤，可在大型生物序列数据库中进行相似性搜索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a multi-dimensional indexing approach for fast sequence similarity search in DNA and protein databases. In particular, we propose effective transformations of subsequences into numerical vector domains and build efficient index structures on the transformed vectors. We then define distance functions in the transformed domain and examine properties of these functions. We experimentally compared their (a) approximation quality for k-Nearest Neighbor (k-NN) queries, (b) pruning ability and (c) approximation quality for E-range queries. Results for k-NN queries, which we present here, show that our proposed distances FD2 and WD2 (i.e. Frequency and Wavelet Distance functions for 2-grams) perform significantly better than the others. We then develop effective index structures, based on R-trees and scalar quantization, on top of transformed vectors and distance functions. Promising results from the experiments on real biosequence data sets are presented.

机译：我们在DNA和蛋白质数据库中提出了一种用于快速序列相似性搜索的多维索引方法。特别是，我们提出了有效转变在数值矢量域中的子序列，并在转换向量上建立有效的指标结构。然后，我们在变换域中定义距离函数并检查这些功能的属性。我们通过实验比较了k最近邻（k-nn）查询的（a）近似质量，（b）修剪能力和（c）近似质量的电子范围查询。我们在这里展示的K-NN查询结果表明，我们所提出的距离FD2和WD2（即2克的频率和小波距离函数）显着比其他方式更好地执行。然后，我们基于R树和标量量化在变换的向量和距离函数之上开发有效的索引结构。提出了来自实际生物酶数据集的实验的有希望的结果。

著录项

来源
《》|2003年|p.359-366|共8页
会议地点
作者
Ozturk; O.; Ferhatosmanoglu; H.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术 ;
关键词
biology computing; DNA; proteins; vectors; trees (mathematics); similarity search; large biosequence databases; protein databases; DNA databases; wavelet distance functions; R-trees; scalar quantization; effective index structures; real biosequence data;

机译：生物学计算; DNA;蛋白质;载体;树（数学）;相似性搜索;大型生物序列数据库;蛋白质数据库; DNA数据库;小波距离函数; R树;标量量化;有效索引结构;真实的生物序列数据;

相似文献

外文文献
中文文献
专利

1. Provably Sensitive Indexing Strategies for Biosequence Similarity Search [J] . Jeremy Buhler Journal of computational biology: A journal of computational molecular cell biology . 2003 ,第3a4期

机译：用于生物序列相似性搜索的可能敏感的索引策略
2. VECTOR SPACE INDEXING FOR BIOSEQUENCE SIMILARITY SEARCHES [J] . OZGUR OZTURK, HAKAN FERHATOSMANOGLU International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2005 ,第5期

机译：用于BIOSEQUENCE相似性搜索的矢量空间索引
3. An efficient similarity search based on indexing in large DNA databases [J] . In-Seon Jeong, Kyoung-Wook Park, Seung-Ho Kang, Computational biology and chemistry . 2010 ,第2期

机译：基于大型DNA数据库中索引的有效相似度搜索
4. Effective indexing and filtering for similarity search in large biosequence databases [C] . Ozgur Ozturk, Hakan Ferhatosmanoglu Institute of Electrical and Electronics Engineers Symposium on Bioinformatics and Bioengineering . 2003

机译：在大型生物酶数据库中有效索引和过滤相似性搜索
5. Indexing techniques for similarity searches in sequence databases [D] . Park, Sanghyun 2000

机译：序列数据库中相似搜索的索引技术
6. Biosequence Similarity Search on the Mercury System [O] . Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, -1

机译：汞系统上的生物序列相似性搜索
7. Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases [O] . Ozgur Ozturk, Hakan Ferhatosmanoglu 2003

机译：大型生物序列数据库中用于相似性搜索的有效索引和过滤

Effective indexing and filtering for similarity search in large biosequence databases

摘要

著录项

相似文献

相关主题

期刊订阅