Detecting protein sequence conservation via metric embeddings

机译：通过度量嵌入检测蛋白质序列保护

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivation: Comparing two protein databases is a fundamental task in biosequence annotation. Given two databases, one must find all pairs of proteins that align with high score under a biologically meaningful substitution score matrix, such as a BLOSUM matrix (Henikoff and Henikoff, 1992). Distance-based approaches to this problem map each peptide in the database to a point in a metric space, such that peptides aligning with higher scores are mapped to closer points. Many techniques exist to discoverclose pairs of points in a metric space efficiently, but the challenge in applying this work to proteomic comparison is to find a distance mapping that accurately encodes all the distinctions among residue pairs made by a proteomic score matrix. Buhler(2002) proposed one such mapping but found that it led to a relatively inefficient algorithm for protein-protein comparison. Results: This work proposes a new distance mapping for peptides under the BLOSUM matrices that permits more efficient similaritysearch. We first propose a new distance function on peptides derived from a given score matrix. We then show how to map peptides to bit vectors such that the distance between any two peptides is closely approximated by the Hamming distance (i.e. number of mismatches) between their corresponding bit vectors. We combine these two results with the LSH-ALL-PAIRS-SIM algorithm of Buhler (2002) to produce an improved distance-based algorithm for proteomic comparison. An initial implementation of the improved algorithm exhibits sensitivity within 5% of that of the original LSH-ALL-PAIRS-SIM, while running up to eight times faster. Availability: The source of the code can be found at http://www.eecs.berkeley.edu/-eran/projects/embed.

机译：动机：比较两个蛋白质数据库是生物序列注释的根本任务。鉴于两个数据库，一个必须找到所有对蛋白质，它们具有生物学意义的替代得分矩阵下的高分，比如BLOSUM矩阵（参见Henikoff和Henikoff，1992）对齐。基于距离的方法解决这个问题数据库中的每个肽映射到一个点在度量空间，使得具有较高分数对准肽被映射到更接近于点。许多技术存在discoverclose对点的度量空间有效，但是在应用这种工作蛋白质组学比较挑战是找到的距离映射准确地编码由蛋白质组学得分矩阵由残基对之间的所有区别。布勒（2002）提出了一个这样的映射，但发现，它导致了相对低效的算法用于蛋白质 - 蛋白质比较。结果：该作品提出了一种用于将BLOSUM下肽新的距离映射矩阵，其允许更有效的similaritysearch。我们首先提出了从给定的分数矩阵衍生肽新的距离函数。然后，我们显示如何映射肽与位向量，使得任何两个肽之间的距离密切可以通过对应的位向量之间的汉明距离（即错配数）近似。我们结合这两种结果与布勒（2002年）的LSH-ALL-PAIRS-SIM算法来产生蛋白质比较改进基于距离的算法。初始执行该算法的表现出与原始LSH-ALL-PAIRS-SIM的5％以内的灵敏度，而较快的运行多达八倍。状况：该代码的来源可以在http://www.eecs.berkeley.edu/-eran/projects/embed找到。

著录项

来源
《International Conference on Intelligent Systems for Molecular biology》|2003年||共8页
会议地点
作者
E. Halperin; J. Buhler; R. Karp; R. Krauthgamer; B. Westover;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q811.4-532;
关键词
protein comparison; database indexing; metric embedding; Hamming space;

机译：蛋白质比较;数据库索引;度量嵌入;汉明空间;

相似文献

外文文献
中文文献
专利

1. Patterns of Sequence Conservation in the S-Layer Proteins and Related Sequences in Clostridium difficile [J] . Emanuela Calabi, Neil Fairweather Journal of bacteriology . 2002,第14期

机译：艰难梭菌S层蛋白的序列保守模式及相关序列
2. An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments. [J] . Yang AS, Honig B Journal of Molecular Biology . 2000,第3期

机译：蛋白质序列和结构分析和建模的综合方法。三，使用多个结构比对的蛋白质结构家族中的序列保守性的比较研究。
3. AdoMet radical proteins—from structure to evolution—alignment of divergent protein sequences reveals strong secondary structure element conservation [J] . Yvain Nicolet, Catherine L. Drennan Nucleic Acids Research . 2004,第13期

机译：AdoMet自由基蛋白（从结构到进化）不同蛋白序列的比对揭示了强大的二级结构元素保守性
4. Detecting protein sequence conservation via metric embeddings [C] . E. Halperin, J. Buhler, R. Karp, International Conference on Intelligent Systems for Molecular biology . 2003

机译：通过度量嵌入检测蛋白质序列保护
5. An ultrasonic approach for nondestructive testing of deteriorating infrastructure: Use of direct-sequence spread spectrum ultrasonic evaluation to detect embedded steel deterioration. [D] . Rens, Kevin Lee. 1994

机译：一种用于恶化基础设施的无损检测的超声方法：使用直接序列扩频超声评估来检测嵌入的钢质劣化。
6. Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences [O] . David C. King, James Taylor, Laura Elnitski, 2005

机译：评估潜在的潜力和保守性得分以检测比对的哺乳动物基因组序列中的顺式调控模块
7. Detecting protein sequence conservation via metric embeddings [O] . E. Halperin, J. Buhler, R. Karp, 2003

机译：通过度量嵌入检测蛋白质序列保护

Detecting protein sequence conservation via metric embeddings

摘要

著录项

相似文献

相关主题

期刊订阅